DSPRelated.com
Forums

C6727 experiences

Started by Andrew Elder December 13, 2005
All,

I wonder if anyone has real-world performance comments on the C6727 ?

Is it MUCH faster than the 6713 ?

Do the extra 32 registers allow the compiler to "go to town" on
optimization ?

We currently use the C6713 and enjoy the L2 caching performance speedup,
even if it is at the expense of 100% deterministic behaviour. Since the
C6727 doesn't have any L2 cache, it looks like we would have to move
things like FIR filter coeffs to internal memory at runtime to get any
sort of performance. In fact we would have to restructure how many of
our algorithms run in order support shuffling arrays of data in and out
of internal memory.

General comments anyone ?

- Andrew E.



Andrew-

> I wonder if anyone has real-world performance comments on the C6727 ?
>
> Is it MUCH faster than the 6713 ?
>
> Do the extra 32 registers allow the compiler to "go to town" on
> optimization ?
>
> We currently use the C6713 and enjoy the L2 caching performance speedup,
> even if it is at the expense of 100% deterministic behaviour. Since the
> C6727 doesn't have any L2 cache, it looks like we would have to move
> things like FIR filter coeffs to internal memory at runtime to get any
> sort of performance. In fact we would have to restructure how many of
> our algorithms run in order support shuffling arrays of data in and out
> of internal memory.
>
> General comments anyone ?

We compared C6727 closely to C5502 for high-performance, high-precision
acoustic/audio applications and stayed with C5502, which has an instruction cache and
is extremely efficient with 32-bit precision operations at 300 MHz. The lack of L2
cache is a big deal.

-Jeff



Andrew,

some comments below.

mikedunn

--- Jeff Brower <jbrower@jbro...> wrote:

> Andrew-
>
> > I wonder if anyone has real-world performance
> comments on the C6727 ?
> >
> > Is it MUCH faster than the 6713 ?
Just because it is newer, doesn't always mean faster
[apps].

> >
> > Do the extra 32 registers allow the compiler to
> "go to town" on
> > optimization ?
> >
> > We currently use the C6713 and enjoy the L2
> caching performance speedup,
> > even if it is at the expense of 100% deterministic
> behaviour. Since the
> > C6727 doesn't have any L2 cache, it looks like we
> would have to move
> > things like FIR filter coeffs to internal memory
> at runtime to get any
> > sort of performance. In fact we would have to
> restructure how many of
> > our algorithms run in order support shuffling
> arrays of data in and out
> > of internal memory.
> >
> > General comments anyone ?
>
> We compared C6727 closely to C5502 for
> high-performance, high-precision
> acoustic/audio applications and stayed with C5502,
> which has an instruction cache and
> is extremely efficient with 32-bit precision
> operations at 300 MHz. The lack of L2
> cache is a big deal.
Some random comments...
- I think that you really need to take a close look
the C6727 before selecting [or not selecting] it
[isn't that always the case??].
- I think that this part is much more 'polarizing'
than any other c6x device when it comes to being 'app
friendly' or 'app unfriendly'.
- Utilizing the ROM DSPLIB and BIOS functions does
kick up the performance.
- On paper [not benchmarked] some of the DMA
capabilities look killer [assuming that you need
them].
mikedunn
>
> -Jeff > c6x-unsubscribe@c6x-... >




Mike and Jeff,

Thanks for the comments. I guess the most useful observation is that the C6727 is not necessarily an obvious upgrade path from a C6713.

Mike, what are ROM DSPLIB functions ?
Has anyone noticed whether updated DSPLIB routines for the 6727 have been released by TI ?

- Andrew E

Mike Dunn wrote:
Andrew,
some comments below.
mikedunn
--- Jeff Brower <j...@signalogic.com> wrote:
Andrew-
I wonder if anyone has real-world performance
comments on the C6727 ?
Is it MUCH faster than the 6713 ?
Just because it is newer, doesn't always mean faster
[apps]. 
Do the extra 32 registers allow the compiler to
"go to town" on
optimization ?
We currently use the C6713 and enjoy the L2
caching performance speedup,
even if it is at the expense of 100% deterministic
behaviour. Since the
C6727 doesn't have any L2 cache, it looks like we
would have to move
things like FIR filter coeffs to internal memory
at runtime to get any
sort of performance. In fact we would have to
restructure how many of
our algorithms run in order support shuffling
arrays of data in and out
of internal memory.
General comments anyone ?
We compared C6727 closely to C5502 for
high-performance, high-precision
acoustic/audio applications and stayed with C5502,
which has an instruction cache and
is extremely efficient with 32-bit precision
operations at 300 MHz. The lack of L2
cache is a big deal.
Some random comments...
- I think that you really need to take a close look
the C6727 before selecting [or not selecting] it
[isn't that always the case??]. - I think that this part is much more
'polarizing'
than any other c6x device when it comes to being 'app
friendly' or 'app unfriendly'. - Utilizing the ROM DSPLIB and
BIOS functions does
kick up the performance. - On paper [not benchmarked] some of the DMA
capabilities look killer [assuming that you need
them].
mikedunn
-Jeff

c...@yahoogroups.com
	
Jeff and Andrew,

I put this on the shelf and had an occaision to look
at some c6727 issues in more detail [one group of
functions picked up a 30%+ speed increase with only a
recompile and a noticeable reduction in code size].
After trying to 'reverse engineer' where the
improvements were located, I thought that I would
apply some of my own advice - RTFM.

My enlightened personal suggestion is that you perform
some serious benchmarking on the c6727 if your 6713 is
running out of gas - and maybe even if it isn't.

I found the following documents very interesting and
helpful - especially the migration guide. It's
amazing how much more meaningful some of the
information can be after you have 'mucked with the
details' and measured it.

SPRAA78May 2005
TMS320C6713 to TMS320C672x Migration Guide
TMS320C672x Floating-Point Digital Signal Processor
ROM
SPRS277AMAY 2005REVISED NOVEMBER 2005
- On-Chip Bootloader
Full-Feature Version of DSP/BIOS Operating System
Optimized Math Library (FastRTS) Library of Commonly
Used DSP Functions (DSPLIB)

RE. Jeff's cache/c5502/c6727 comments
Jeff's comments seemed a bit odd, but I was busy at
the time... After seeing a very significant
performance increase with the c6727, I decided that
'something was wrong with the picture'. I am not sure
if Jeff's code just has 'not much to do' or what.
Although it is true that there is no L2 cache on the
c6727, it is also true that the c6727 L1P cache is
32KB [vs. 4 KB for 6713 L1P or 64K max for 6713 L2].
I looked up the c5502 and it had even less cache 16KB. mikedunn

--- Andrew Elder <andrew_elder@andr...> wrote:

> Mike and Jeff,
>
> Thanks for the comments. I guess the most useful
> observation is that the
> C6727 is not necessarily an obvious upgrade path
> from a C6713.
>
> Mike, what are ROM DSPLIB functions ?
> Has anyone noticed whether updated DSPLIB routines
> for the 6727 have
> been released by TI ?
>
> - Andrew E
>
> Mike Dunn wrote:
>
> >Andrew,
> >
> >some comments below.
> >
> >mikedunn
> >
> >--- Jeff Brower <jbrower@jbro...> wrote:
> >
> >
> >
> >>Andrew-
> >>
> >>
> >>
> >>>I wonder if anyone has real-world performance
> >>>
> >>>
> >>comments on the C6727 ?
> >>
> >>
> >>>Is it MUCH faster than the 6713 ?
> >>>
> >>>
> >Just because it is newer, doesn't always mean
> faster
> >[apps].
> >
> >
> >
> >>>Do the extra 32 registers allow the compiler to
> >>>
> >>>
> >>"go to town" on
> >>
> >>
> >>>optimization ?
> >>>
> >>>We currently use the C6713 and enjoy the L2
> >>>
> >>>
> >>caching performance speedup,
> >>
> >>
> >>>even if it is at the expense of 100%
> deterministic
> >>>
> >>>
> >>behaviour. Since the
> >>
> >>
> >>>C6727 doesn't have any L2 cache, it looks like we
> >>>
> >>>
> >>would have to move
> >>
> >>
> >>>things like FIR filter coeffs to internal memory
> >>>
> >>>
> >>at runtime to get any
> >>
> >>
> >>>sort of performance. In fact we would have to
> >>>
> >>>
> >>restructure how many of
> >>
> >>
> >>>our algorithms run in order support shuffling
> >>>
> >>>
> >>arrays of data in and out
> >>
> >>
> >>>of internal memory.
> >>>
> >>>General comments anyone ?
> >>>
> >>>
> >>We compared C6727 closely to C5502 for
> >>high-performance, high-precision
> >>acoustic/audio applications and stayed with C5502,
> >>which has an instruction cache and
> >>is extremely efficient with 32-bit precision
> >>operations at 300 MHz. The lack of L2
> >>cache is a big deal.
> >>
> >>
> >Some random comments...
> >- I think that you really need to take a close look
> >the C6727 before selecting [or not selecting] it
> >[isn't that always the case??].
> >- I think that this part is much more 'polarizing'
> >than any other c6x device when it comes to being
> 'app
> >friendly' or 'app unfriendly'.
> >- Utilizing the ROM DSPLIB and BIOS functions does
> >kick up the performance.
> >- On paper [not benchmarked] some of the DMA
> >capabilities look killer [assuming that you need
> >them].
> >mikedunn
> >
> >
> >>-Jeff
> >>
> >>
> >>
> >>
> >>
> >> c6x-unsubscribe@c6x-...
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>




Mike-

> I put this on the shelf and had an occaision to look
> at some c6727 issues in more detail [one group of
> functions picked up a 30%+ speed increase with only a
> recompile and a noticeable reduction in code size].
> After trying to 'reverse engineer' where the
> improvements were located, I thought that I would
> apply some of my own advice - RTFM.
>
> My enlightened personal suggestion is that you perform
> some serious benchmarking on the c6727 if your 6713 is
> running out of gas - and maybe even if it isn't.
>
> I found the following documents very interesting and
> helpful - especially the migration guide. It's
> amazing how much more meaningful some of the
> information can be after you have 'mucked with the
> details' and measured it.
>
> SPRAA78May 2005
> TMS320C6713 to TMS320C672x Migration Guide
> TMS320C672x Floating-Point Digital Signal Processor
> ROM
> SPRS277AMAY 2005REVISED NOVEMBER 2005
> - On-Chip Bootloader
> Full-Feature Version of DSP/BIOS Operating System
> Optimized Math Library (FastRTS) Library of Commonly
> Used DSP Functions (DSPLIB)
>
> RE. Jeff's cache/c5502/c6727 comments
> Jeff's comments seemed a bit odd, but I was busy at
> the time... After seeing a very significant
> performance increase with the c6727, I decided that
> 'something was wrong with the picture'. I am not sure
> if Jeff's code just has 'not much to do' or what.
> Although it is true that there is no L2 cache on the
> c6727, it is also true that the c6727 L1P cache is
> 32KB [vs. 4 KB for 6713 L1P or 64K max for 6713 L2].
> I looked up the c5502 and it had even less cache 16KB.

I wish our code had not much to do :-) My experience has been that C5xxx code
compiles far more compactly than C67xx, making onchip memory and cache more
effective.

C6727 small size is impressive for floating-point, but I still don't see a persuasive
advantage for audio/acoustic applications vs. smaller C55xx devices that have lower
power consumption and a wider range of peripherals those apps typically need. Vs.
C6713 it's faster but what if the app needs SDRAM? 64k x combined total data + prog
memory is sort of a throwback, not to mention requiring very compact code (see
previous point).

If no L2 cache, no low power, no suite of I/O peripherals, then why not double the
clock rate? C641x runs at 1 GHz, TigerSharc runs 600 MHz. Where's the compelling
reason?

-Jeff

> --- Andrew Elder <andrew_elder@andr...> wrote:
>
> > Mike and Jeff,
> >
> > Thanks for the comments. I guess the most useful
> > observation is that the
> > C6727 is not necessarily an obvious upgrade path
> > from a C6713.
> >
> > Mike, what are ROM DSPLIB functions ?
> > Has anyone noticed whether updated DSPLIB routines
> > for the 6727 have
> > been released by TI ?
> >
> > - Andrew E
> >
> > Mike Dunn wrote:
> >
> > >Andrew,
> > >
> > >some comments below.
> > >
> > >mikedunn
> > >
> > >--- Jeff Brower <jbrower@jbro...> wrote:
> > >
> > >
> > >
> > >>Andrew-
> > >>
> > >>
> > >>
> > >>>I wonder if anyone has real-world performance
> > >>>
> > >>>
> > >>comments on the C6727 ?
> > >>
> > >>
> > >>>Is it MUCH faster than the 6713 ?
> > >>>
> > >>>
> > >Just because it is newer, doesn't always mean
> > faster
> > >[apps].
> > >
> > >
> > >
> > >>>Do the extra 32 registers allow the compiler to
> > >>>
> > >>>
> > >>"go to town" on
> > >>
> > >>
> > >>>optimization ?
> > >>>
> > >>>We currently use the C6713 and enjoy the L2
> > >>>
> > >>>
> > >>caching performance speedup,
> > >>
> > >>
> > >>>even if it is at the expense of 100%
> > deterministic
> > >>>
> > >>>
> > >>behaviour. Since the
> > >>
> > >>
> > >>>C6727 doesn't have any L2 cache, it looks like we
> > >>>
> > >>>
> > >>would have to move
> > >>
> > >>
> > >>>things like FIR filter coeffs to internal memory
> > >>>
> > >>>
> > >>at runtime to get any
> > >>
> > >>
> > >>>sort of performance. In fact we would have to
> > >>>
> > >>>
> > >>restructure how many of
> > >>
> > >>
> > >>>our algorithms run in order support shuffling
> > >>>
> > >>>
> > >>arrays of data in and out
> > >>
> > >>
> > >>>of internal memory.
> > >>>
> > >>>General comments anyone ?
> > >>>
> > >>>
> > >>We compared C6727 closely to C5502 for
> > >>high-performance, high-precision
> > >>acoustic/audio applications and stayed with C5502,
> > >>which has an instruction cache and
> > >>is extremely efficient with 32-bit precision
> > >>operations at 300 MHz. The lack of L2
> > >>cache is a big deal.
> > >>
> > >>
> > >Some random comments...
> > >- I think that you really need to take a close look
> > >the C6727 before selecting [or not selecting] it
> > >[isn't that always the case??].
> > >- I think that this part is much more 'polarizing'
> > >than any other c6x device when it comes to being
> > 'app
> > >friendly' or 'app unfriendly'.
> > >- Utilizing the ROM DSPLIB and BIOS functions does
> > >kick up the performance.
> > >- On paper [not benchmarked] some of the DMA
> > >capabilities look killer [assuming that you need
> > >them].
> > >mikedunn
> > >
> > >
> > >>-Jeff
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> c6x-unsubscribe@c6x-...
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >



Jeff,

--- Jeff Brower <jbrower@jbro...> wrote:

> Mike-
>
> > I put this on the shelf and had an occaision to
> look
> > at some c6727 issues in more detail [one group of
> > functions picked up a 30%+ speed increase with
> only a
> > recompile and a noticeable reduction in code
> size].
> > After trying to 'reverse engineer' where the
> > improvements were located, I thought that I would
> > apply some of my own advice - RTFM.
> >
> > My enlightened personal suggestion is that you
> perform
> > some serious benchmarking on the c6727 if your
> 6713 is
> > running out of gas - and maybe even if it isn't.
> >
> > I found the following documents very interesting
> and
> > helpful - especially the migration guide. It's
> > amazing how much more meaningful some of the
> > information can be after you have 'mucked with the
> > details' and measured it.
> >
> > SPRAA78May 2005
> > TMS320C6713 to TMS320C672x Migration Guide
> > TMS320C672x Floating-Point Digital Signal
> Processor
> > ROM
> > SPRS277AMAY 2005REVISED NOVEMBER 2005
> > - On-Chip Bootloader
> > Full-Feature Version of DSP/BIOS Operating
> System
> > Optimized Math Library (FastRTS) Library of
> Commonly
> > Used DSP Functions (DSPLIB)
> >
> > RE. Jeff's cache/c5502/c6727 comments
> > Jeff's comments seemed a bit odd, but I was busy
> at
> > the time... After seeing a very significant
> > performance increase with the c6727, I decided
> that
> > 'something was wrong with the picture'. I am not
> sure
> > if Jeff's code just has 'not much to do' or what.
> > Although it is true that there is no L2 cache on
> the
> > c6727, it is also true that the c6727 L1P cache is
> > 32KB [vs. 4 KB for 6713 L1P or 64K max for 6713
> L2].
> > I looked up the c5502 and it had even less cache
> 16KB.
>
> I wish our code had not much to do :-) My
> experience has been that C5xxx code
> compiles far more compactly than C67xx, making
> onchip memory and cache more
> effective.
>
> C6727 small size is impressive for floating-point,
> but I still don't see a persuasive
> advantage for audio/acoustic applications vs.
> smaller C55xx devices that have lower
> power consumption and a wider range of peripherals
> those apps typically need. Vs.
> C6713 it's faster but what if the app needs SDRAM?
> 64k x combined total data + prog
> memory is sort of a throwback, not to mention
> requiring very compact code (see
> previous point).
>
> If no L2 cache, no low power, no suite of I/O
> peripherals, then why not double the
> clock rate? C641x runs at 1 GHz, TigerSharc runs
> 600 MHz. Where's the compelling
> reason?
I guess that you have listed some of the reasons that
TI makes more than one architecture.

mikedunn

>
> -Jeff
>
> > --- Andrew Elder <andrew_elder@andr...> wrote:
> >
> > > Mike and Jeff,
> > >
> > > Thanks for the comments. I guess the most useful
> > > observation is that the
> > > C6727 is not necessarily an obvious upgrade path
> > > from a C6713.
> > >
> > > Mike, what are ROM DSPLIB functions ?
> > > Has anyone noticed whether updated DSPLIB
> routines
> > > for the 6727 have
> > > been released by TI ?
> > >
> > > - Andrew E
> > >
> > > Mike Dunn wrote:
> > >
> > > >Andrew,
> > > >
> > > >some comments below.
> > > >
> > > >mikedunn
> > > >
> > > >--- Jeff Brower <jbrower@jbro...> wrote:
> > > >
> > > >
> > > >
> > > >>Andrew-
> > > >>
> > > >>
> > > >>
> > > >>>I wonder if anyone has real-world performance
> > > >>>
> > > >>>
> > > >>comments on the C6727 ?
> > > >>
> > > >>
> > > >>>Is it MUCH faster than the 6713 ?
> > > >>>
> > > >>>
> > > >Just because it is newer, doesn't always mean
> > > faster
> > > >[apps].
> > > >
> > > >
> > > >
> > > >>>Do the extra 32 registers allow the compiler
> to
> > > >>>
> > > >>>
> > > >>"go to town" on
> > > >>
> > > >>
> > > >>>optimization ?
> > > >>>
> > > >>>We currently use the C6713 and enjoy the L2
> > > >>>
> > > >>>
> > > >>caching performance speedup,
> > > >>
> > > >>
> > > >>>even if it is at the expense of 100%
> > > deterministic
> > > >>>
> > > >>>
> > > >>behaviour. Since the
> > > >>
> > > >>
> > > >>>C6727 doesn't have any L2 cache, it looks
> like we
> > > >>>
> > > >>>
> > > >>would have to move
> > > >>
> > > >>
> > > >>>things like FIR filter coeffs to internal
> memory
> > > >>>
> > > >>>
> > > >>at runtime to get any
> > > >>
> > > >>
> > > >>>sort of performance. In fact we would have to
> > > >>>
> > > >>>
> > > >>restructure how many of
> > > >>
> > > >>
> > > >>>our algorithms run in order support shuffling
> > > >>>
> > > >>>
> > > >>arrays of data in and out
> > > >>
> > > >>
> > > >>>of internal memory.
> > > >>>
> > > >>>General comments anyone ?
> > > >>>
> > > >>>
> > > >>We compared C6727 closely to C5502 for
> > > >>high-performance, high-precision
> > > >>acoustic/audio applications and stayed with
> C5502,
> > > >>which has an instruction cache and
> > > >>is extremely efficient with 32-bit precision
> > > >>operations at 300 MHz. The lack of L2
> > > >>cache is a big deal.
> > > >>
> > > >>
> > > >Some random comments...
> > > >- I think that you really need to take a close
> look
> > > >the C6727 before selecting [or not selecting]
> it
> > > >[isn't that always the case??].
> > > >- I think that this part is much more
> 'polarizing'
>
=== message truncated ===