Hi Folks, Does anyone know when information on this topic may be found? Specifically, I'm interested in comparing the TI 64x family to the x86 family (Pentium IV, M, Xeon, etc.), especially in terms of integer performance and multiply-accumulates. --Randy
Seeking DSP/x86 Performance Comparisons
Started by ●December 16, 2005
Reply by ●December 16, 20052005-12-16
Randy Yates wrote:> Hi Folks, > > Does anyone know when information on this topic may be > found? Specifically, I'm interested in comparing the TI 64x > family to the x86 family (Pentium IV, M, Xeon, etc.), especially > in terms of integer performance and multiply-accumulates. >Randy, Comparing an elephant and a whale is an interesting idea. I doubt if you can find the same DSP benchmark for x86 and for a DSP since there is no common ground for comparison. Nevertheless a while ago I compared the speed of x86 and ADSP-21xx on typical DSP operations (FIR/IIR filters). The code was hand optimized for both CPUs. For the same clock rate, the P5 is about 3 times slower then the DSP. The 486 is about 5 times slower. It is worth mentioning that on P5+ the floating point is faster then integer calculations. There are also things like MMX and SSE however you have to use asm to make it efficient. I would say for the same clock rate a general P5+ is 2-3 times slower then a general DSP. VLV
Reply by ●December 16, 20052005-12-16
On Fri, 16 Dec 2005 11:52:02 -0800, Randy Yates wrote:> Does anyone know when information on this topic may be > found? Specifically, I'm interested in comparing the TI 64x > family to the x86 family (Pentium IV, M, Xeon, etc.), especially > in terms of integer performance and multiply-accumulates.There are a few comparison sites around, but not really much at the level that you're looking at. For example: http://www.eembc.org/benchmark/telecom.asp?APPL=TLC Now, telecom applications are probably (haven't looked at the code for these) dominated by 16-bit fixed-point operations, so the floating point parts are kind of the wrong shape. There are some x86 there, (old AMD K6), and some fairly contemporary power PCs of the same sort of class (IBM 970FX aka G5) which shows up in a few industrial embedded products. Certainly the TI C64x parts seem to kill them here, although there seem to be a very wide range of performances for the same part. Must need some careful tuning to get good figures. In my personal experience, straight C code on GCC -O3 -fast on my old P3/500 is worth about 150 Motorola DSP MIPS: about the factor of three that Vladimir mentioned. Dunno about more modern machines, but I would expect them to be comparatively faster per clock, since they have more and better pipelines, larger caches and faster off-chip busses. Let us know what you find out, when you run your own benchmarks? Cheers, -- Andrew
Reply by ●December 16, 20052005-12-16
Andrew Reilly wrote:> In my personal experience, straight C code on GCC -O3 -fast on my old > P3/500 is worth about 150 Motorola DSP MIPS: about the factor of three > that Vladimir mentioned. Dunno about more modern machines, but I would > expect them to be comparatively faster per clock, since they have more and > better pipelines, larger caches and faster off-chip busses.Modern CPUs like P-3, P-4, etc. are not optimized for 16 bit and 8 bit operations. The 32 bit int computation on P-4 runs almost three times faster then 8 or 16 bit. I had difficulty believing that before I tried the same code (int matrix multiply) with 8, 16 and 32 bit data. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●December 16, 20052005-12-16
Randy Yates wrote:> Hi Folks, > > Does anyone know when information on this topic may be > found? Specifically, I'm interested in comparing the TI 64x > family to the x86 family (Pentium IV, M, Xeon, etc.), especially > in terms of integer performance and multiply-accumulates.Try http://www.bdti.com/. They have very thorough comparison of DSP performance for a wide range of processors.
Reply by ●December 16, 20052005-12-16
On Fri, 16 Dec 2005 22:12:50 +0000, Vladimir Vassilevsky wrote:> Andrew Reilly wrote: > > >> In my personal experience, straight C code on GCC -O3 -fast on my old >> P3/500 is worth about 150 Motorola DSP MIPS: about the factor of three >> that Vladimir mentioned. Dunno about more modern machines, but I would >> expect them to be comparatively faster per clock, since they have more and >> better pipelines, larger caches and faster off-chip busses. > > Modern CPUs like P-3, P-4, etc. are not optimized for 16 bit and 8 bit > operations. The 32 bit int computation on P-4 runs almost three times > faster then 8 or 16 bit. I had difficulty believing that before I tried > the same code (int matrix multiply) with 8, 16 and 32 bit data.I should clarify: I was thinking of floating point C code on the P3 vs equivalent fixed point code on the DSP. Integer does go a bit slower, but I expect that quite a bit of that is the paucity of registers. If you need to do short integer work on a PC, isn't MMX the way it's done? -- Andrew
Reply by ●December 16, 20052005-12-16
> If you need to do short integer work on a PC, isn't MMX the way it's done?Yes, I meant to include MMX-based performance in the comparison. --Randy
Reply by ●December 16, 20052005-12-16
Andrew Reilly wrote:>>Modern CPUs like P-3, P-4, etc. are not optimized for 16 bit and 8 bit >>operations. The 32 bit int computation on P-4 runs almost three times >>faster then 8 or 16 bit. I had difficulty believing that before I tried >>the same code (int matrix multiply) with 8, 16 and 32 bit data. > > > I should clarify: I was thinking of floating point C code on the P3 vs > equivalent fixed point code on the DSP. Integer does go a bit slower, but > I expect that quite a bit of that is the paucity of registers. > > If you need to do short integer work on a PC, isn't MMX the way it's done? >It is good to develop in C using the floating point instead of hacking integers in the assembler. The whole point in developing on PC is doing it nice and easy. Unfortunately the C compilers for x86 can't really use MMX and SSE. You have to do it in asm or you can use somebody else's library like this: http://www.intel.com/cd/software/products/asmo-na/eng/238685.htm BTW, the DSP performance benchmarks for x86 can be found here: http://cache-www.intel.com/cd/00/00/21/93/219360_wp_ipp_benchmark.pdf Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●December 16, 20052005-12-16
Vladimir Vassilevsky wrote:> It is good to develop in C using the floating point instead of hacking > integers in the assembler. The whole point in developing on PC is doing > it nice and easy. Unfortunately the C compilers for x86 can't really use > MMX and SSE. You have to do it in asm or you can use somebody else's > libraryThe newer gcc (and I think intel) C compilers allow you to use intrinsic functions to operate on vectors of 4 floats at a time (or vectors of 8,16,32 bit integers for mmx) The newest compilers even let you go one step further and use operators like + - * ^ and so on rather than the intrinsics like _mm_add_ps It is now quite possible to get simd-optimized code without doing a bit of assembly. -- Mark Borgerding
Reply by ●December 16, 20052005-12-16
Mark Borgerding wrote:> >>It is good to develop in C using the floating point instead of hacking >>integers in the assembler. The whole point in developing on PC is doing >>it nice and easy. Unfortunately the C compilers for x86 can't really use >>MMX and SSE. You have to do it in asm or you can use somebody else's >>library > > > The newer gcc (and I think intel) C compilers allow you to use intrinsic > functions to operate on vectors of 4 floats at a time (or vectors of > 8,16,32 bit integers for mmx) > > The newest compilers even let you go one step further and use operators > like + - * ^ and so on rather than the intrinsics like _mm_add_ps > > It is now quite possible to get simd-optimized code without doing a bit > of assembly.The compiler by itself is not going to optimize your C code into SIMD. Telling compiler what to do at the low level is not much different from the use of the assembler, because it is you who have to know the hardware and who have to tell the compiler about the hardware. Indeed the latest ICC is trying to use SSE by itself. However it does it really lousy - no comparison with the hand written code. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com






