"phil" <pwicker@mac.com> wrote in news:1105425839.141033.38840
@c13g2000cwb.googlegroups.com:

> 
> Shawn Steenhagen wrote:
>> "phil" <pwicker@mac.com> wrote in message
>> news:1105115854.912487.298940@z14g2000cwz.googlegroups.com...
>> > Hello all,
>> >
>> > I'm currently evaluating some DSPs for a soon to come project which
>> > will include the development of a hardware platform and the port of
> a
>> > software currently running on Mac and PC platforms. I've written a
> non
>> > optimized piece of code (in C++) and ran it with the simulators
>> > provided by Texas and Analog. I got about 74000 cycles for the 6713
> and
>> > 32000 for the 21262. I don't want to (re)start a war between pro-TI
> and
>> > pro-Analog, but this ratio appears quite surprising and I'm
> wondering
>> > if anyone has already run some identical C programs on both target?
> Did
>> > you encounter such differences?
>> >
>> > Following is the code of a function that is called 8 times:
>> >
>> > void DoPitch(int inLength, float inPitch, float inOffset, float*
> bufin,
>> > float* outBuf)
>> > {
>> > float pitch = inPitch;
>> > float index = inOffset;
>> > register float* bufout = outBuf;
>> > register float* topout = bufout + inLength;
>> > while (bufout < topout) {
>> > int idx = (int)index;
>> > float k = index - (float)idx;
>> > float val1 =  bufin[idx];
>> > float val2 =  bufin[idx + 1];
>> > *bufout++  = k * (val2 - val1) + val1;
>> > index     += pitch;
>> >     }
>> > }
>> >
>>
>> Coming from the TI camp here.  Experience with the 6713 dictates you
> need to
>> use a lot of special programming techniques (code architecture, use
> of
>> #pragmas, etc) to get the MIPS out of that part as advertised.  This
> is
>> based on my experience with their C compiler.  They have an extensive
> user's
>> guide dedicated to such tasks, and once you start using these tricks,
> the
>> results are dramatic.  I should comment that I haven't used there C++
>> compiler at all.
>>
>> If potential part count is high, perhaps someone at TI will quickly
> optimize
>> and create a benchmark for your function.
>>
>> -Shawn
> 
> Yes you're right. It appears that their compiler is not very good at
> taking a "standard" C/C++ code. I've tried some of the tips suggested
> by the consultant tool (eg the restrict keyword) and the cycles number
> felt down to 20000 ( from 74000). The same trick used with the 21262
> gives 32000 down to 29000. With a TigerSHARC, the number of cycles goes
> from 38000 down to 22000. The real problem for us is that the benches
> ran on the real code (the one that has to be ported) shows that we'll
> need to heavily rework the time consuming methods and likely rewrite
> them in assembler. The Texas assembler looks quite uneasy to use. I
> still have to determine the learning time of the different assemblers,
> estimate the optimization ratio assembler/C and estimate the time
> needed to rewrite parts of the original code.
> 
> Philippe
> 

Don't disregard the ADSP-21364 if you are truly MIPs limited. This part is 
pin compatible, code compatible and 1.67 x faster than the 21262.

Have you actually taken your benchmark and coded it in ADI assembly? Your 
example looks pretty easy to code. I think this would be an interesting 
benchmark as well. 

We write virtually all of our code in assembly. It is no harder to write 
than C in a SHARC. 

There are many intangibles when comparing DSPs (and companies). I have 
personally worked with ADI DSPs for over 10 years. The ADI parts have 
worked well, parts were delivered when I needed them (not always true of 
other suppliers during this period) and support has been excellent. 

Danville has taken this position to an extreme: We have built our company 
around ADI DSP, particularly the SHARC. You can check out our roadmap (and 
ADIs) at this link:

http://www.danvillesignal.com/index.php?id=roadmap

-- 
Al Clark
Danville Signal Processing, Inc.
--------------------------------------------------------------------
Purveyors of Fine DSP Hardware and other Cool Stuff
Available at http://www.danvillesignal.com

Hello Philippe,

> still have to determine the learning time of the different
assemblers,
> estimate the optimization ratio assembler/C and estimate the time
> needed to rewrite parts of the original code.

My comment, highly ADI biased: go for the SHARCs. A you could see, it
gives you the possibility of optimizing your C code via compiler flags,
#pragmas, etc... Besides that, you have very good roadmap towards
performance, among other metrics. And also, whenever you need to take
advantage of every single clock sycle, you can do assembly code by
hand, which is REALLY easy with the SHARC parts. Think of it's assembly
language as a "low level C".

Regards,

JaaC

Shawn Steenhagen wrote:
> "phil" <pwicker@mac.com> wrote in message
> news:1105115854.912487.298940@z14g2000cwz.googlegroups.com...
> > Hello all,
> >
> > I'm currently evaluating some DSPs for a soon to come project which
> > will include the development of a hardware platform and the port of
a
> > software currently running on Mac and PC platforms. I've written a
non
> > optimized piece of code (in C++) and ran it with the simulators
> > provided by Texas and Analog. I got about 74000 cycles for the 6713
and
> > 32000 for the 21262. I don't want to (re)start a war between pro-TI
and
> > pro-Analog, but this ratio appears quite surprising and I'm
wondering
> > if anyone has already run some identical C programs on both target?
Did
> > you encounter such differences?
> >
> > Following is the code of a function that is called 8 times:
> >
> > void DoPitch(int inLength, float inPitch, float inOffset, float*
bufin,
> > float* outBuf)
> > {
> > float pitch = inPitch;
> > float index = inOffset;
> > register float* bufout = outBuf;
> > register float* topout = bufout + inLength;
> > while (bufout < topout) {
> > int idx = (int)index;
> > float k = index - (float)idx;
> > float val1 =  bufin[idx];
> > float val2 =  bufin[idx + 1];
> > *bufout++  = k * (val2 - val1) + val1;
> > index     += pitch;
> >     }
> > }
> >
>
> Coming from the TI camp here.  Experience with the 6713 dictates you
need to
> use a lot of special programming techniques (code architecture, use
of
> #pragmas, etc) to get the MIPS out of that part as advertised.  This
is
> based on my experience with their C compiler.  They have an extensive
user's
> guide dedicated to such tasks, and once you start using these tricks,
the
> results are dramatic.  I should comment that I haven't used there C++
> compiler at all.
>
> If potential part count is high, perhaps someone at TI will quickly
optimize
> and create a benchmark for your function.
>
> -Shawn

Yes you're right. It appears that their compiler is not very good at
taking a "standard" C/C++ code. I've tried some of the tips suggested
by the consultant tool (eg the restrict keyword) and the cycles number
felt down to 20000 ( from 74000). The same trick used with the 21262
gives 32000 down to 29000. With a TigerSHARC, the number of cycles goes
from 38000 down to 22000. The real problem for us is that the benches
ran on the real code (the one that has to be ported) shows that we'll
need to heavily rework the time consuming methods and likely rewrite
them in assembler. The Texas assembler looks quite uneasy to use. I
still have to determine the learning time of the different assemblers,
estimate the optimization ratio assembler/C and estimate the time
needed to rewrite parts of the original code.

Philippe

"phil" <pwicker@mac.com> wrote in message
news:1105115854.912487.298940@z14g2000cwz.googlegroups.com...
> Hello all,
>
> I'm currently evaluating some DSPs for a soon to come project which
> will include the development of a hardware platform and the port of a
> software currently running on Mac and PC platforms. I've written a non
> optimized piece of code (in C++) and ran it with the simulators
> provided by Texas and Analog. I got about 74000 cycles for the 6713 and
> 32000 for the 21262. I don't want to (re)start a war between pro-TI and
> pro-Analog, but this ratio appears quite surprising and I'm wondering
> if anyone has already run some identical C programs on both target? Did
> you encounter such differences?
>
> Following is the code of a function that is called 8 times:
>
> void DoPitch(int inLength, float inPitch, float inOffset, float* bufin,
> float* outBuf)
> {
> float pitch = inPitch;
> float index = inOffset;
> register float* bufout = outBuf;
> register float* topout = bufout + inLength;
> while (bufout < topout) {
> int idx = (int)index;
> float k = index - (float)idx;
> float val1 =  bufin[idx];
> float val2 =  bufin[idx + 1];
> *bufout++  = k * (val2 - val1) + val1;
> index     += pitch;
>     }
> }
>

Coming from the TI camp here.  Experience with the 6713 dictates you need to
use a lot of special programming techniques (code architecture, use of
#pragmas, etc) to get the MIPS out of that part as advertised.  This is
based on my experience with their C compiler.  They have an extensive user's
guide dedicated to such tasks, and once you start using these tricks, the
results are dramatic.  I should comment that I haven't used there C++
compiler at all.

If potential part count is high, perhaps someone at TI will quickly optimize
and create a benchmark for your function.

-Shawn

"phil" <pwicker@mac.com> wrote in news:1105115854.912487.298940
@z14g2000cwz.googlegroups.com:

> Hello all,
> 
> I'm currently evaluating some DSPs for a soon to come project which
> will include the development of a hardware platform and the port of a
> software currently running on Mac and PC platforms. I've written a non
> optimized piece of code (in C++) and ran it with the simulators
> provided by Texas and Analog. I got about 74000 cycles for the 6713 and
> 32000 for the 21262. I don't want to (re)start a war between pro-TI and
> pro-Analog, but this ratio appears quite surprising and I'm wondering
> if anyone has already run some identical C programs on both target? Did
> you encounter such differences?
> 
> Following is the code of a function that is called 8 times:
> 
> void DoPitch(int inLength, float inPitch, float inOffset, float* bufin,
> float* outBuf)
> {
> float pitch = inPitch;
> float index = inOffset;
> register float* bufout = outBuf;
> register float* topout = bufout + inLength;
> while (bufout < topout) {
> int idx = (int)index;
> float k = index - (float)idx;
> float val1 =  bufin[idx];
> float val2 =  bufin[idx + 1];
> *bufout++  = k * (val2 - val1) + val1;
> index     += pitch;
>     }
> }
> 
> 

First of all, I am an ADI partisan. I can't tell you if your benchmark is 
representative since I never use the TI part.

Here are some additional things to consider with the SHARC:

1. Assembly language programming is very easy. It is very difficult with 
the TI. This allows you to write go fast code where you need it even if 
much of your application is written in C.

2. If you need more MIPs. the 21364 is also available. It runs 1.66 x 
faster than the 21262 and uses the same pinouts and footprint.

3. We have a variety of boards that support the 21261, 21262 and 21364. 
Check out our web site or call us if you are interested.





-- 
Al Clark
Danville Signal Processing, Inc.
--------------------------------------------------------------------
Purveyors of Fine DSP Hardware and other Cool Stuff
Available at http://www.danvillesignal.com

Hello all,

I'm currently evaluating some DSPs for a soon to come project which
will include the development of a hardware platform and the port of a
software currently running on Mac and PC platforms. I've written a non
optimized piece of code (in C++) and ran it with the simulators
provided by Texas and Analog. I got about 74000 cycles for the 6713 and
32000 for the 21262. I don't want to (re)start a war between pro-TI and
pro-Analog, but this ratio appears quite surprising and I'm wondering
if anyone has already run some identical C programs on both target? Did
you encounter such differences?

Following is the code of a function that is called 8 times:

void DoPitch(int inLength, float inPitch, float inOffset, float* bufin,
float* outBuf)
{
float pitch = inPitch;
float index = inOffset;
register float* bufout = outBuf;
register float* topout = bufout + inLength;
while (bufout < topout) {
int idx = (int)index;
float k = index - (float)idx;
float val1 =  bufin[idx];
float val2 =  bufin[idx + 1];
*bufout++  = k * (val2 - val1) + val1;
index     += pitch;
    }
}