Forums

Execution time of Texas C6713 versus SHARC 21262

Started by phil January 7, 2005
Hello all,

I'm currently evaluating some DSPs for a soon to come project which
will include the development of a hardware platform and the port of a
software currently running on Mac and PC platforms. I've written a non
optimized piece of code (in C++) and ran it with the simulators
provided by Texas and Analog. I got about 74000 cycles for the 6713 and
32000 for the 21262. I don't want to (re)start a war between pro-TI and
pro-Analog, but this ratio appears quite surprising and I'm wondering
if anyone has already run some identical C programs on both target? Did
you encounter such differences?

Following is the code of a function that is called 8 times:

void DoPitch(int inLength, float inPitch, float inOffset, float* bufin,
float* outBuf)
{
float pitch = inPitch;
float index = inOffset;
register float* bufout = outBuf;
register float* topout = bufout + inLength;
while (bufout < topout) {
int idx = (int)index;
float k = index - (float)idx;
float val1 =  bufin[idx];
float val2 =  bufin[idx + 1];
*bufout++  = k * (val2 - val1) + val1;
index     += pitch;
    }
}

"phil" <pwicker@mac.com> wrote in news:1105115854.912487.298940
@z14g2000cwz.googlegroups.com:

> Hello all, > > I'm currently evaluating some DSPs for a soon to come project which > will include the development of a hardware platform and the port of a > software currently running on Mac and PC platforms. I've written a non > optimized piece of code (in C++) and ran it with the simulators > provided by Texas and Analog. I got about 74000 cycles for the 6713 and > 32000 for the 21262. I don't want to (re)start a war between pro-TI and > pro-Analog, but this ratio appears quite surprising and I'm wondering > if anyone has already run some identical C programs on both target? Did > you encounter such differences? > > Following is the code of a function that is called 8 times: > > void DoPitch(int inLength, float inPitch, float inOffset, float* bufin, > float* outBuf) > { > float pitch = inPitch; > float index = inOffset; > register float* bufout = outBuf; > register float* topout = bufout + inLength; > while (bufout < topout) { > int idx = (int)index; > float k = index - (float)idx; > float val1 = bufin[idx]; > float val2 = bufin[idx + 1]; > *bufout++ = k * (val2 - val1) + val1; > index += pitch; > } > } > >
First of all, I am an ADI partisan. I can't tell you if your benchmark is representative since I never use the TI part. Here are some additional things to consider with the SHARC: 1. Assembly language programming is very easy. It is very difficult with the TI. This allows you to write go fast code where you need it even if much of your application is written in C. 2. If you need more MIPs. the 21364 is also available. It runs 1.66 x faster than the 21262 and uses the same pinouts and footprint. 3. We have a variety of boards that support the 21261, 21262 and 21364. Check out our web site or call us if you are interested. -- Al Clark Danville Signal Processing, Inc. -------------------------------------------------------------------- Purveyors of Fine DSP Hardware and other Cool Stuff Available at http://www.danvillesignal.com
"phil" <pwicker@mac.com> wrote in message
news:1105115854.912487.298940@z14g2000cwz.googlegroups.com...
> Hello all, > > I'm currently evaluating some DSPs for a soon to come project which > will include the development of a hardware platform and the port of a > software currently running on Mac and PC platforms. I've written a non > optimized piece of code (in C++) and ran it with the simulators > provided by Texas and Analog. I got about 74000 cycles for the 6713 and > 32000 for the 21262. I don't want to (re)start a war between pro-TI and > pro-Analog, but this ratio appears quite surprising and I'm wondering > if anyone has already run some identical C programs on both target? Did > you encounter such differences? > > Following is the code of a function that is called 8 times: > > void DoPitch(int inLength, float inPitch, float inOffset, float* bufin, > float* outBuf) > { > float pitch = inPitch; > float index = inOffset; > register float* bufout = outBuf; > register float* topout = bufout + inLength; > while (bufout < topout) { > int idx = (int)index; > float k = index - (float)idx; > float val1 = bufin[idx]; > float val2 = bufin[idx + 1]; > *bufout++ = k * (val2 - val1) + val1; > index += pitch; > } > } >
Coming from the TI camp here. Experience with the 6713 dictates you need to use a lot of special programming techniques (code architecture, use of #pragmas, etc) to get the MIPS out of that part as advertised. This is based on my experience with their C compiler. They have an extensive user's guide dedicated to such tasks, and once you start using these tricks, the results are dramatic. I should comment that I haven't used there C++ compiler at all. If potential part count is high, perhaps someone at TI will quickly optimize and create a benchmark for your function. -Shawn
Shawn Steenhagen wrote:
> "phil" <pwicker@mac.com> wrote in message > news:1105115854.912487.298940@z14g2000cwz.googlegroups.com... > > Hello all, > > > > I'm currently evaluating some DSPs for a soon to come project which > > will include the development of a hardware platform and the port of
a
> > software currently running on Mac and PC platforms. I've written a
non
> > optimized piece of code (in C++) and ran it with the simulators > > provided by Texas and Analog. I got about 74000 cycles for the 6713
and
> > 32000 for the 21262. I don't want to (re)start a war between pro-TI
and
> > pro-Analog, but this ratio appears quite surprising and I'm
wondering
> > if anyone has already run some identical C programs on both target?
Did
> > you encounter such differences? > > > > Following is the code of a function that is called 8 times: > > > > void DoPitch(int inLength, float inPitch, float inOffset, float*
bufin,
> > float* outBuf) > > { > > float pitch = inPitch; > > float index = inOffset; > > register float* bufout = outBuf; > > register float* topout = bufout + inLength; > > while (bufout < topout) { > > int idx = (int)index; > > float k = index - (float)idx; > > float val1 = bufin[idx]; > > float val2 = bufin[idx + 1]; > > *bufout++ = k * (val2 - val1) + val1; > > index += pitch; > > } > > } > > > > Coming from the TI camp here. Experience with the 6713 dictates you
need to
> use a lot of special programming techniques (code architecture, use
of
> #pragmas, etc) to get the MIPS out of that part as advertised. This
is
> based on my experience with their C compiler. They have an extensive
user's
> guide dedicated to such tasks, and once you start using these tricks,
the
> results are dramatic. I should comment that I haven't used there C++ > compiler at all. > > If potential part count is high, perhaps someone at TI will quickly
optimize
> and create a benchmark for your function. > > -Shawn
Yes you're right. It appears that their compiler is not very good at taking a "standard" C/C++ code. I've tried some of the tips suggested by the consultant tool (eg the restrict keyword) and the cycles number felt down to 20000 ( from 74000). The same trick used with the 21262 gives 32000 down to 29000. With a TigerSHARC, the number of cycles goes from 38000 down to 22000. The real problem for us is that the benches ran on the real code (the one that has to be ported) shows that we'll need to heavily rework the time consuming methods and likely rewrite them in assembler. The Texas assembler looks quite uneasy to use. I still have to determine the learning time of the different assemblers, estimate the optimization ratio assembler/C and estimate the time needed to rewrite parts of the original code. Philippe
Hello Philippe,

> still have to determine the learning time of the different
assemblers,
> estimate the optimization ratio assembler/C and estimate the time > needed to rewrite parts of the original code.
My comment, highly ADI biased: go for the SHARCs. A you could see, it gives you the possibility of optimizing your C code via compiler flags, #pragmas, etc... Besides that, you have very good roadmap towards performance, among other metrics. And also, whenever you need to take advantage of every single clock sycle, you can do assembly code by hand, which is REALLY easy with the SHARC parts. Think of it's assembly language as a "low level C". Regards, JaaC
"phil" <pwicker@mac.com> wrote in news:1105425839.141033.38840
@c13g2000cwb.googlegroups.com:

> > Shawn Steenhagen wrote: >> "phil" <pwicker@mac.com> wrote in message >> news:1105115854.912487.298940@z14g2000cwz.googlegroups.com... >> > Hello all, >> > >> > I'm currently evaluating some DSPs for a soon to come project which >> > will include the development of a hardware platform and the port of > a >> > software currently running on Mac and PC platforms. I've written a > non >> > optimized piece of code (in C++) and ran it with the simulators >> > provided by Texas and Analog. I got about 74000 cycles for the 6713 > and >> > 32000 for the 21262. I don't want to (re)start a war between pro-TI > and >> > pro-Analog, but this ratio appears quite surprising and I'm > wondering >> > if anyone has already run some identical C programs on both target? > Did >> > you encounter such differences? >> > >> > Following is the code of a function that is called 8 times: >> > >> > void DoPitch(int inLength, float inPitch, float inOffset, float* > bufin, >> > float* outBuf) >> > { >> > float pitch = inPitch; >> > float index = inOffset; >> > register float* bufout = outBuf; >> > register float* topout = bufout + inLength; >> > while (bufout < topout) { >> > int idx = (int)index; >> > float k = index - (float)idx; >> > float val1 = bufin[idx]; >> > float val2 = bufin[idx + 1]; >> > *bufout++ = k * (val2 - val1) + val1; >> > index += pitch; >> > } >> > } >> > >> >> Coming from the TI camp here. Experience with the 6713 dictates you > need to >> use a lot of special programming techniques (code architecture, use > of >> #pragmas, etc) to get the MIPS out of that part as advertised. This > is >> based on my experience with their C compiler. They have an extensive > user's >> guide dedicated to such tasks, and once you start using these tricks, > the >> results are dramatic. I should comment that I haven't used there C++ >> compiler at all. >> >> If potential part count is high, perhaps someone at TI will quickly > optimize >> and create a benchmark for your function. >> >> -Shawn > > Yes you're right. It appears that their compiler is not very good at > taking a "standard" C/C++ code. I've tried some of the tips suggested > by the consultant tool (eg the restrict keyword) and the cycles number > felt down to 20000 ( from 74000). The same trick used with the 21262 > gives 32000 down to 29000. With a TigerSHARC, the number of cycles goes > from 38000 down to 22000. The real problem for us is that the benches > ran on the real code (the one that has to be ported) shows that we'll > need to heavily rework the time consuming methods and likely rewrite > them in assembler. The Texas assembler looks quite uneasy to use. I > still have to determine the learning time of the different assemblers, > estimate the optimization ratio assembler/C and estimate the time > needed to rewrite parts of the original code. > > Philippe >
Don't disregard the ADSP-21364 if you are truly MIPs limited. This part is pin compatible, code compatible and 1.67 x faster than the 21262. Have you actually taken your benchmark and coded it in ADI assembly? Your example looks pretty easy to code. I think this would be an interesting benchmark as well. We write virtually all of our code in assembly. It is no harder to write than C in a SHARC. There are many intangibles when comparing DSPs (and companies). I have personally worked with ADI DSPs for over 10 years. The ADI parts have worked well, parts were delivered when I needed them (not always true of other suppliers during this period) and support has been excellent. Danville has taken this position to an extreme: We have built our company around ADI DSP, particularly the SHARC. You can check out our roadmap (and ADIs) at this link: http://www.danvillesignal.com/index.php?id=roadmap -- Al Clark Danville Signal Processing, Inc. -------------------------------------------------------------------- Purveyors of Fine DSP Hardware and other Cool Stuff Available at http://www.danvillesignal.com
Hello,

Al Clark wrote:

> Danville has taken this position to an extreme: We have built our
company
> around ADI DSP, particularly the SHARC.
This is a good point. Some others have done it, too. Bittware is a good example. And in South America, SanJaaC Electronics is trying to follow these steps! ------------------------------ Jaime Andr=E9s Aranguren Cardona jaac@sanjaac.com SanJaaC Electronics Soluciones en DSP www.sanjaac.com