Sign in

username:

password:



Not a member?

Search c6x



Search tips

Subscribe to c6x



c6x by Keywords

AD535 | BIOS | Booting | Bootloader | C621 | C6211 | C6415 | C671 | C6711 | C6711DSK | C6713 | CCS | Chassaing | COFF | DAT | DM64 | DM642 | DMA | DSK671 | DSK6711 | EDM | EDMA | EMIF | Emulator | EVM | EVM620 | FFT | FIR | GPIO | Halting | HPI | HWI | IDK | JTAG | LDB | LDH | LDW | Linker | LMS | LOG_printf | Matlab | McBSP | MEM_alloc | MIPS | PCI | PCM3003 | Pipeline | Profiling | QDM | Reset | ROM | RTDX | Sampling | SDRAM | Stack | TEB | THS1206 | TMS320C621 | TMS320C6416 | TMS320C6711 | TMS320C6713 | UART | Vector Table | XBUS | XDS560


Discussion Groups

See Also

Embedded SystemsFPGAElectronics

Discussion Groups | TMS320C6x | Is Assembly code/linear assembly code necessary?

Technical discussions about the TI C6000 DSPs (including the c62x, c64x and c67x DSPs).

  

Post a new Thread

Is Assembly code/linear assembly code necessary? - jogg...@gmail.com - Mar 29 23:32:54 2009

Hi, all
   Nowadays I am reading the documents about optimization:
spru187o TMS320C6000 Optimizing Compiler v6.1.pdf
spru198i TMS320C6000 Programmer’s Guide.pdf
spru732c TMS320C64xC64x+ DSP CPU and Instruction Set Reference Guide.pdf.

C64x+ is a special architecture, and instructions has different latencies
depending on the type fo inctructions. At first, I think refining C/C++ code
with pragmas and programming with intrinsics can solve the problem of
optimization. But I got some ppt from internet. It seems assembly code is
necessary sometimes. 
For your experience, do you think assembly code/linear assembly code  is
necessary?Under what conditions and for what application?

Thanks in advance.
Jogging

_____________________________________

______________________________
New Code Sharing Section now Live on DSPRelated.com. Learn about the Reward Program for Contributors here.



(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

Re: Is Assembly code/linear assembly code necessary? - rvsasi - Mar 31 9:53:26 2009

Hi,
It is difficult to give an answer without a complete understanding of the r=
eal-time deadlines=C2=A0 of the system.

Let me take a swing at it in the most general fashion.

If your are looking for an average issue slot usage of 6-7 or above (out of=
 a total possible 8) for inner loops/kernel of your algorithm, then there m=
ight be a=C2=A0 need for pipelined/linear assembly (more likely that you wo=
uld need pipelined assembly). But pipelined assembly takes long time to dev=
elop and difficult to maintain, by an order.

Linear assembly is much easier to code and somewhat easier to maintain than=
 pipelined assembly. Linear assembly has given me outputs of the order 5 to=
 6 (average) for the inner loops. But it is possible that out of 10 cycles,=
 two or three might be at, 4 out of a total of 8 per cycle.

Obviously you have to make an initial target mapping analysis of your
 requirement by mapping the loads/stores and arithmetic of your algorithm i=
nto C6X VLIW instruction set capabilities, keeping in mind all restrictions=
 of the processor (cross path stalls etc..). If you are using existing libr=
ary functions, this might become a little difficult. But in general this an=
alysis gives you a good idea of what is achievable.

In general intrinsics with good compiler directives (pragmas),does the
job for most applications. C6X provides very efficient pragmas for
optimizations C6X intrinsics with good pragma's easily give you an average =
issue slot usage 4-6.

Pragma's are critical, but also critical are usage of type qualifiers like =
restrict, const etc..
In addition C6X provide pragma's to align memory elements. Completely avoid=
ing unaligned accesses, can be a benefit. In addition,C6X compiler provides=
 good debug info, (I think you need to turn it on), on what exactly can imp=
rove the algorithm performance. For example, if there are excessive registe=
r to memory pills.=20
Regards

--- On Sun, 3/29/09, j...@gmail.com <j...@gmail.com> wrote:
From: j...@gmail.com <j...@gmail.com>
Subject: [c6x] Is Assembly code/linear assembly code  necessary?
To: c...@yahoogroups.com
Date: Sunday, March 29, 2009, 7:41 PM

Hi, all
=20=20
 Nowadays I am reading the documents about optimization:
spru187o TMS320C6000 Optimizing Compiler v6.1.pdf
spru198i TMS320C6000 Programmer=E2=80=99s Guide.pdf
spru732c TMS320C64xC64x+ DSP CPU and Instruction Set Reference Guide.pdf.

C64x+ is a special architecture, and instructions has different latencies
depending on the type fo inctructions. At first, I think refining C/C++ cod=
e
with pragmas and programming with intrinsics can solve the problem of
optimization. But I got some ppt from internet. It seems assembly code is
necessary sometimes.=20
For your experience, do you think assembly code/linear assembly code  is
necessary?Under what conditions and for what application?

Thanks in advance.
Jogging
=20=20=20=20=20=20

_____________________________________





(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

Re: Is Assembly code/linear assembly code necessary? - jogging song - Apr 22 11:35:52 2009

Hi,
    Thanks for your opinion. I agree with you completely.
Recently I find memory access may influence the performance more than
assembly code.
In order to learn more about the memory access effect, I do some tests.
I run the IMG_perimeter function from imglib library on DM6437 EVM.
In the example, test program runs the function in c and then the function i=
n
assembly code.
At first, I put the data in L2 RAM, the resulting time is below:
IMG_perimeter asm cycle: 1029
IMG_perimeter c cycle: 2941

Then I put the data in external memory DDR2, the resulting time is below.
IMG_perimeter asm cycle: 6250
IMG_perimeter c cycle: 13234

We can see that if the data is put in L2 RAM, the time can be reduced
from 13234  to 2941.  It is much better than assembly code optimization
which reduces time from 13234  to 6250.

Before I pay my attention to assembly code optimization, and haven't
found memory access effect.

My another question is that: memory access latency is multiple cycles in th=
e
C64x+ pipeline.
For load instruction, it needs five cycles to obtain data. If queue or tree
data structure is used,
I don't know how to optimize it. Can anyone share his experience with it?

Thanks in advance.
Jogging

On Tue, Mar 31, 2009 at 1:13 PM, rvsasi <r...@yahoo.com> wrote:

> Hi,
> It is difficult to give an answer without a complete understanding of the
> real-time deadlines  of the system.
>
> Let me take a swing at it in the most general fashion.
>
> If your are looking for an average issue slot usage of 6-7 or above (out =
of
> a total possible 8) for inner loops/kernel of your algorithm, then there
> might be a  need for pipelined/linear assembly (more likely that you woul=
d
> need pipelined assembly). But pipelined assembly takes long time to devel=
op
> and difficult to maintain, by an order.
>
> Linear assembly is much easier to code and somewhat easier to maintain th=
an
> pipelined assembly. Linear assembly has given me outputs of the order 5 t=
o 6
> (average) for the inner loops. But it is possible that out of 10 cycles, =
two
> or three might be at, 4 out of a total of 8 per cycle.
>
> Obviously you have to make an initial target mapping analysis of your
> requirement by mapping the loads/stores and arithmetic of your algorithm
> into C6X VLIW instruction set capabilities, keeping in mind all restricti=
ons
> of the processor (cross path stalls etc..). If you are using existing
> library functions, this might become a little difficult. But in general t=
his
> analysis gives you a good idea of what is achievable.
>
> In general intrinsics with good compiler directives (pragmas),does the jo=
b
> for most applications. C6X provides very efficient pragmas for optimizati=
ons
> C6X intrinsics with good pragma's easily give you an average issue slot
> usage 4-6.
>
> Pragma's are critical, but also critical are usage of type qualifiers lik=
e
> restrict, const etc..
> In addition C6X provide pragma's to align memory elements. Completely
> avoiding unaligned accesses, can be a benefit. In addition,C6X compiler
> provides good debug info, (I think you need to turn it on), on what exact=
ly
> can improve the algorithm performance. For example, if there are excessiv=
e
> register to memory pills.
> Regards
>
> --- On *Sun, 3/29/09, j...@gmail.com <j...@gmail.com>*wrote=
:
>
> From: j...@gmail.com <j...@gmail.com>
> Subject: [c6x] Is Assembly code/linear assembly code necessary?
> To: c...@yahoogroups.com
> Date: Sunday, March 29, 2009, 7:41 PM
> Hi, all
>  Nowadays I am reading the documents about optimization:
> spru187o TMS320C6000 Optimizing Compiler v6.1.pdf
> spru198i TMS320C6000 Programmer=92s Guide.pdf
> spru732c TMS320C64xC64x+ DSP CPU and Instruction Set Reference Guide.pdf.
>
> C64x+ is a special architecture, and instructions has different latencies
> depending on the type fo inctructions. At first, I think refining C/C++ c=
ode
> with pragmas and programming with intrinsics can solve the problem of
> optimization. But I got some ppt from internet. It seems assembly code is
> necessary sometimes.
> For your experience, do you think assembly code/linear assembly code  is
> necessary?Under what conditions and for what application?
>
> Thanks in advance.
> Jogging
>

_____________________________________

______________________________
Start your Android Ice Cream Sandwich development on TI's AM35x Sitara ARM Cortex-A8 processor today.



(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

RE: Is Assembly code/linear assembly code necessary? - christophe blouet - Apr 22 14:11:09 2009

Hi,=20

=20

I have some doubts on your figures, are you sure you had Cache enabled when=
 running in external memory?

Where were the data to process? in internal SDRAM as well?

I wouldn't use the term internal L2 RAM, L2 means Level 2 Cache, internal R=
AM is internal RAM, it sounds 2 different things to me.

=20

Normally with some good pragmas and optimise instructions to the compiler y=
ou can get the same result as assembly code, but for far less efforts.

=20

Regards

=20

> To: r...@yahoo.com
> CC: c...@yahoogroups.com
> From: j...@gmail.com
> Date: Wed, 22 Apr 2009 20:38:20 +0800
> Subject: Re: [c6x] Is Assembly code/linear assembly code necessary?
>=20
> Hi,
> Thanks for your opinion. I agree with you completely.
> Recently I find memory access may influence the performance more than
> assembly code.
> In order to learn more about the memory access effect, I do some tests.
> I run the IMG_perimeter function from imglib library on DM6437 EVM.
> In the example, test program runs the function in c and then the function=
 in
> assembly code.
> At first, I put the data in L2 RAM, the resulting time is below:
> IMG_perimeter asm cycle: 1029
> IMG_perimeter c cycle: 2941
>=20
> Then I put the data in external memory DDR2, the resulting time is below.
> IMG_perimeter asm cycle: 6250
> IMG_perimeter c cycle: 13234
>=20
> We can see that if the data is put in L2 RAM, the time can be reduced
> from 13234 to 2941. It is much better than assembly code optimization
> which reduces time from 13234 to 6250.
>=20
> Before I pay my attention to assembly code optimization, and haven't
> found memory access effect.
>=20
> My another question is that: memory access latency is multiple cycles in =
the
> C64x+ pipeline.
> For load instruction, it needs five cycles to obtain data. If queue or tr=
ee
> data structure is used,
> I don't know how to optimize it. Can anyone share his experience with it?
>=20
> Thanks in advance.
> Jogging
>=20
> On Tue, Mar 31, 2009 at 1:13 PM, rvsasi <r...@yahoo.com> wrote:
>=20
> > Hi,
> > It is difficult to give an answer without a complete understanding of
t=
he
> > real-time deadlines of the system.
> >
> > Let me take a swing at it in the most general fashion.
> >
> > If your are looking for an average issue slot usage of 6-7 or above
(ou=
t of
> > a total possible 8) for inner loops/kernel of your algorithm, then
ther=
e
> > might be a need for pipelined/linear assembly (more likely that you
wou=
ld
> > need pipelined assembly). But pipelined assembly takes long time to
dev=
elop
> > and difficult to maintain, by an order.
> >
> > Linear assembly is much easier to code and somewhat easier to maintain
=
than
> > pipelined assembly. Linear assembly has given me outputs of the order
5=
 to 6
> > (average) for the inner loops. But it is possible that out of 10
cycles=
, two
> > or three might be at, 4 out of a total of 8 per cycle.
> >
> > Obviously you have to make an initial target mapping analysis of your
> > requirement by mapping the loads/stores and arithmetic of your
algorith=
m
> > into C6X VLIW instruction set capabilities, keeping in mind all
restric=
tions
> > of the processor (cross path stalls etc..). If you are using existing
> > library functions, this might become a little difficult. But in
general=
 this
> > analysis gives you a good idea of what is achievable.
> >
> > In general intrinsics with good compiler directives (pragmas),does the
=
job
> > for most applications. C6X provides very efficient pragmas for
optimiza=
tions
> > C6X intrinsics with good pragma's easily give you an average issue
slot
> > usage 4-6.
> >
> > Pragma's are critical, but also critical are usage of type qualifiers
l=
ike
> > restrict, const etc..
> > In addition C6X provide pragma's to align memory elements. Completely
> > avoiding unaligned accesses, can be a benefit. In addition,C6X
compiler
> > provides good debug info, (I think you need to turn it on), on what
exa=
ctly
> > can improve the algorithm performance. For example, if there are
excess=
ive
> > register to memory pills.
> >
> >
> > Regards
> >
> > --- On *Sun, 3/29/09, j...@gmail.com <j...@gmail.com>*wro=
te:
> >
> > From: j...@gmail.com <j...@gmail.com>
> > Subject: [c6x] Is Assembly code/linear assembly code necessary?
> > To: c...@yahoogroups.com
> > Date: Sunday, March 29, 2009, 7:41 PM
> >
> >
> > Hi, all
> >
> >
> > Nowadays I am reading the documents about optimization:
> > spru187o TMS320C6000 Optimizing Compiler v6.1.pdf
> > spru198i TMS320C6000 Programmer=92s Guide.pdf
> > spru732c TMS320C64xC64x+ DSP CPU and Instruction Set Reference
Guide.pd=
f.
> >
> > C64x+ is a special architecture, and instructions has different
latenci=
es
> > depending on the type fo inctructions. At first, I think refining
C/C++=
 code
> > with pragmas and programming with intrinsics can solve the problem of
> > optimization. But I got some ppt from internet. It seems assembly code
=
is
> > necessary sometimes.
> > For your experience, do you think assembly code/linear assembly code
is
> > necessary?Under what conditions and for what application?
> >
> > Thanks in advance.
> > Jogging
> >

_____________________________________

______________________________
New Code Sharing Section now Live on DSPRelated.com. Learn about the Reward Program for Contributors here.



(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

Re: Is Assembly code/linear assembly code necessary? - jogging song - Apr 22 22:00:12 2009

Hi,
     I assure that external memory is cacheable because I obtain three sets
of figures.
The third set of figure is with cache off on external memory.
IMG_perimeter asm cycle: 28444
IMG_perimeter c cycle: 298242

In the function IMG_perimeter needs one input and one output.
In the test I put them both in internal RAM or in DDR2.

Best Regards
Jogging

On Thu, Apr 23, 2009 at 12:23 AM, christophe blouet <
c...@hotmail.com> wrote:

>  Hi,
>
> I have some doubts on your figures, are you sure you had Cache enabled
when
> running in external memory?
> Where were the data to process? in internal SDRAM as well?
> I wouldn't use the term internal L2 RAM, L2 means Level 2 Cache, internal
> RAM is internal RAM, it sounds 2 different things to me.
>
> Normally with some good pragmas and optimise instructions to the compiler
> you can get the same result as assembly code, but for far less efforts.
>
> Regards
> > To: r...@yahoo.com
> > CC: c...@yahoogroups.com
> > From: j...@gmail.com
> > Date: Wed, 22 Apr 2009 20:38:20 +0800
> > Subject: Re: [c6x] Is Assembly code/linear assembly code necessary?
>
> >
> > Hi,
> > Thanks for your opinion. I agree with you completely.
> > Recently I find memory access may influence the performance more than
> > assembly code.
> > In order to learn more about the memory access effect, I do some
tests.
> > I run the IMG_perimeter function from imglib library on DM6437 EVM.
> > In the example, test program runs the function in c and then the
function
> in
> > assembly code.
> > At first, I put the data in L2 RAM, the resulting time is below:
> > IMG_perimeter asm cycle: 1029
> > IMG_perimeter c cycle: 2941
> >
> > Then I put the data in external memory DDR2, the resulting time is
below.
> > IMG_perimeter asm cycle: 6250
> > IMG_perimeter c cycle: 13234
> >
> > We can see that if the data is put in L2 RAM, the time can be reduced
> > from 13234 to 2941. It is much better than assembly code optimization
> > which reduces time from 13234 to 6250.
> >
> > Before I pay my attention to assembly code optimization, and haven't
> > found memory access effect.
> >
> > My another question is that: memory access latency is multiple cycles
in
> the
> > C64x+ pipeline.
> > For load instruction, it needs five cycles to obtain data. If queue
or
> tree
> > data structure is used,
> > I don't know how to optimize it. Can anyone share his experience with
it?
> >
> > Thanks in advance.
> > Jogging
> >
> > On Tue, Mar 31, 2009 at 1:13 PM, rvsasi <r...@yahoo.com> wrote:
> >
> > > Hi,
> > > It is difficult to give an answer without a complete
understanding of
> the
> > > real-time deadlines of the system.
> > >
> > > Let me take a swing at it in the most general fashion.
> > >
> > > If your are looking for an average issue slot usage of 6-7 or
above
> (out of
> > > a total possible 8) for inner loops/kernel of your algorithm,
then
> there
> > > might be a need for pipelined/linear assembly (more likely that
you
> would
> > > need pipelined assembly). But pipelined assembly takes long time
to
> develop
> > > and difficult to maintain, by an order.
> > >
> > > Linear assembly is much easier to code and somewhat easier to
maintain
> than
> > > pipelined assembly. Linear assembly has given me outputs of the
order 5
> to 6
> > > (average) for the inner loops. But it is possible that out of 10
> cycles, two
> > > or three might be at, 4 out of a total of 8 per cycle.
> > >
> > > Obviously you have to make an initial target mapping analysis of
your
> > > requirement by mapping the loads/stores and arithmetic of your
> algorithm
> > > into C6X VLIW instruction set capabilities, keeping in mind all
> restrictions
> > > of the processor (cross path stalls etc..). If you are using
existing
> > > library functions, this might become a little difficult. But in
general
> this
> > > analysis gives you a good idea of what is achievable.
> > >
> > > In general intrinsics with good compiler directives
(pragmas),does the
> job
> > > for most applications. C6X provides very efficient pragmas for
> optimizations
> > > C6X intrinsics with good pragma's easily give you an average
issue slot
> > > usage 4-6.
> > >
> > > Pragma's are critical, but also critical are usage of type
qualifiers
> like
> > > restrict, const etc..
> > > In addition C6X provide pragma's to align memory elements.
Completely
> > > avoiding unaligned accesses, can be a benefit. In addition,C6X
compiler
> > > provides good debug info, (I think you need to turn it on), on
what
> exactly
> > > can improve the algorithm performance. For example, if there are
> excessive
> > > register to memory pills.
> > >
> > >
> > > Regards
> > >
> > > --- On *Sun, 3/29/09, j...@gmail.com <j...@gmail.com
> >*wrote:
> > >
> > > From: j...@gmail.com <j...@gmail.com>
> > > Subject: [c6x] Is Assembly code/linear assembly code necessary?
> > > To: c...@yahoogroups.com
> > > Date: Sunday, March 29, 2009, 7:41 PM
> > >
> > >
> > > Hi, all
> > >
> > >
> > > Nowadays I am reading the documents about optimization:
> > > spru187o TMS320C6000 Optimizing Compiler v6.1.pdf
> > > spru198i TMS320C6000 Programmer’s Guide.pdf
> > > spru732c TMS320C64xC64x+ DSP CPU and Instruction Set Reference
> Guide.pdf.
> > >
> > > C64x+ is a special architecture, and instructions has different
> latencies
> > > depending on the type fo inctructions. At first, I think refining
C/C++
> code
> > > with pragmas and programming with intrinsics can solve the
problem of
> > > optimization. But I got some ppt from internet. It seems assembly
code
> is
> > > necessary sometimes.
> > > For your experience, do you think assembly code/linear assembly
code is
> > > necessary?Under what conditions and for what application?
> > >
> > > Thanks in advance.
> > > Jogging
> > >
> >
> >
> >
> >
> > 
> >
> > _____________________________________
> > 
> >
> >
> 




(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

Re: Is Assembly code/linear assembly code necessary? - Michael Dunn - Apr 23 1:36:47 2009

jogging,

On Wed, Apr 22, 2009 at 7:38 AM, jogging song <j...@gmail.com> wrote=
:
> Hi,
> =A0 =A0Thanks for your opinion. I agree with you completely.
> Recently I find memory access may influence the performance more than
> assembly code.
> In order to learn more about the memory access effect, I do some tests.
> I run the IMG_perimeter function from imglib library on DM6437 EVM.
> In the example, test program runs the function in c and then the function=
 in
> assembly code.
> At first, I put the data in L2 RAM, the resulting time is below:
> IMG_perimeter asm cycle: 1029
> IMG_perimeter c cycle: 2941
>
> Then I put the data in external memory DDR2, the resulting time is below.
> IMG_perimeter asm cycle: 6250
> IMG_perimeter c cycle: 13234
>
> We can see that if the data is put in L2 RAM, the time can be reduced
> from 13234 =A0to 2941. =A0It is much better than assembly code optimizati=
on
> which reduces time from 13234 =A0to 6250.
>
> Before I pay my attention to assembly code optimization, and haven't
> found memory access effect.
>
> My another question is that: memory access latency is multiple cycles in =
the
> C64x+ pipeline.
> For load instruction, it needs five cycles to obtain data. If queue or tr=
ee
> data structure is used,
> I don't know how to optimize it. Can anyone share his experience with it?
<mld>
Check out 'delay slots' and 'load instructions' in spru732c. If you
look at the assembly code generated by the C compiler, you will
probably see that it makes use of the delay slots.
Q1. Are you comparing optimized [by the compiler] C code with assembly code=
??

mikedunn
>
> Thanks in advance.
> Jogging
>
> On Tue, Mar 31, 2009 at 1:13 PM, rvsasi <r...@yahoo.com> wrote:
>
>> Hi,
>> It is difficult to give an answer without a complete understanding of
th=
e
>> real-time deadlines =A0of the system.
>>
>> Let me take a swing at it in the most general fashion.
>>
>> If your are looking for an average issue slot usage of 6-7 or above
(out=
 of
>> a total possible 8) for inner loops/kernel of your algorithm, then
there
>> might be a =A0need for pipelined/linear assembly (more likely that you
w=
ould
>> need pipelined assembly). But pipelined assembly takes long time to
deve=
lop
>> and difficult to maintain, by an order.
>>
>> Linear assembly is much easier to code and somewhat easier to maintain
t=
han
>> pipelined assembly. Linear assembly has given me outputs of the order 5
=
to 6
>> (average) for the inner loops. But it is possible that out of 10
cycles,=
 two
>> or three might be at, 4 out of a total of 8 per cycle.
>>
>> Obviously you have to make an initial target mapping analysis of your
>> requirement by mapping the loads/stores and arithmetic of your
algorithm
>> into C6X VLIW instruction set capabilities, keeping in mind all
restrict=
ions
>> of the processor (cross path stalls etc..). If you are using existing
>> library functions, this might become a little difficult. But in general
=
this
>> analysis gives you a good idea of what is achievable.
>>
>> In general intrinsics with good compiler directives (pragmas),does the
j=
ob
>> for most applications. C6X provides very efficient pragmas for
optimizat=
ions
>> C6X intrinsics with good pragma's easily give you an average issue
slot
>> usage 4-6.
>>
>> Pragma's are critical, but also critical are usage of type qualifiers
li=
ke
>> restrict, const etc..
>> In addition C6X provide pragma's to align memory elements. Completely
>> avoiding unaligned accesses, can be a benefit. In addition,C6X
compiler
>> provides good debug info, (I think you need to turn it on), on what
exac=
tly
>> can improve the algorithm performance. For example, if there are
excessi=
ve
>> register to memory pills.
>> Regards
>>
>> --- On *Sun, 3/29/09, j...@gmail.com <j...@gmail.com>*wrot=
e:
>>
>> From: j...@gmail.com <j...@gmail.com>
>> Subject: [c6x] Is Assembly code/linear assembly code necessary?
>> To: c...@yahoogroups.com
>> Date: Sunday, March 29, 2009, 7:41 PM
>> Hi, all
>> =A0Nowadays I am reading the documents about optimization:
>> spru187o TMS320C6000 Optimizing Compiler v6.1.pdf
>> spru198i TMS320C6000 Programmer=92s Guide.pdf
>> spru732c TMS320C64xC64x+ DSP CPU and Instruction Set Reference
Guide.pdf=
.
>>
>> C64x+ is a special architecture, and instructions has different
latencie=
s
>> depending on the type fo inctructions. At first, I think refining C/C++
=
code
>> with pragmas and programming with intrinsics can solve the problem of
>> optimization. But I got some ppt from internet. It seems assembly code
i=
s
>> necessary sometimes.
>> For your experience, do you think assembly code/linear assembly code
=A0=
is
>> necessary?Under what conditions and for what application?
>>
>> Thanks in advance.
>> Jogging
>>
>
>
> _____________________________________
>

--=20
www.dsprelated.com/blogs-1/nf/Mike_Dunn.php

_____________________________________

______________________________
Start your Android Ice Cream Sandwich development on TI's AM35x Sitara ARM Cortex-A8 processor today.



(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

Re: Is Assembly code/linear assembly code necessary? - Michael Dunn - Apr 23 1:37:38 2009

christophe,

On Wed, Apr 22, 2009 at 11:23 AM, christophe blouet
<c...@hotmail.com> wrote:
> Hi,
>
> I have some doubts on your figures, are you sure you had Cache enabled wh=
en running in external memory?
>
> Where were the data to process? in internal SDRAM as well?
<mld>
If we are being picky about terminology [I do not care for the term
'L2 RAM'], should we not say 'internal SDRAM'?? :-)

mikedunn
>
> I wouldn't use the term internal L2 RAM, L2 means Level 2 Cache, internal=
 RAM is internal RAM, it sounds 2 different things to me.
>
> Normally with some good pragmas and optimise instructions to the compiler=
 you can get the same result as assembly code, but for far less efforts.
>
> Regards
>> To: r...@yahoo.com
>> CC: c...@yahoogroups.com
>> From: j...@gmail.com
>> Date: Wed, 22 Apr 2009 20:38:20 +0800
>> Subject: Re: [c6x] Is Assembly code/linear assembly code necessary?
>>
>> Hi,
>> Thanks for your opinion. I agree with you completely.
>> Recently I find memory access may influence the performance more than
>> assembly code.
>> In order to learn more about the memory access effect, I do some
tests.
>> I run the IMG_perimeter function from imglib library on DM6437 EVM.
>> In the example, test program runs the function in c and then the
functio=
n in
>> assembly code.
>> At first, I put the data in L2 RAM, the resulting time is below:
>> IMG_perimeter asm cycle: 1029
>> IMG_perimeter c cycle: 2941
>>
>> Then I put the data in external memory DDR2, the resulting time is
below=
.
>> IMG_perimeter asm cycle: 6250
>> IMG_perimeter c cycle: 13234
>>
>> We can see that if the data is put in L2 RAM, the time can be reduced
>> from 13234 to 2941. It is much better than assembly code optimization
>> which reduces time from 13234 to 6250.
>>
>> Before I pay my attention to assembly code optimization, and haven't
>> found memory access effect.
>>
>> My another question is that: memory access latency is multiple cycles
in=
 the
>> C64x+ pipeline.
>> For load instruction, it needs five cycles to obtain data. If queue or
t=
ree
>> data structure is used,
>> I don't know how to optimize it. Can anyone share his experience with
it=
?
>>
>> Thanks in advance.
>> Jogging
>>
>> On Tue, Mar 31, 2009 at 1:13 PM, rvsasi <r...@yahoo.com> wrote:
>>
>> > Hi,
>> > It is difficult to give an answer without a complete understanding
of =
the
>> > real-time deadlines of the system.
>> >
>> > Let me take a swing at it in the most general fashion.
>> >
>> > If your are looking for an average issue slot usage of 6-7 or
above (o=
ut of
>> > a total possible 8) for inner loops/kernel of your algorithm, then
the=
re
>> > might be a need for pipelined/linear assembly (more likely that
you wo=
uld
>> > need pipelined assembly). But pipelined assembly takes long time
to de=
velop
>> > and difficult to maintain, by an order.
>> >
>> > Linear assembly is much easier to code and somewhat easier to
maintain=
 than
>> > pipelined assembly. Linear assembly has given me outputs of the
order =
5 to 6
>> > (average) for the inner loops. But it is possible that out of 10
cycle=
s, two
>> > or three might be at, 4 out of a total of 8 per cycle.
>> >
>> > Obviously you have to make an initial target mapping analysis of
your
>> > requirement by mapping the loads/stores and arithmetic of your
algorit=
hm
>> > into C6X VLIW instruction set capabilities, keeping in mind all
restri=
ctions
>> > of the processor (cross path stalls etc..). If you are using
existing
>> > library functions, this might become a little difficult. But in
genera=
l this
>> > analysis gives you a good idea of what is achievable.
>> >
>> > In general intrinsics with good compiler directives (pragmas),does
the=
 job
>> > for most applications. C6X provides very efficient pragmas for
optimiz=
ations
>> > C6X intrinsics with good pragma's easily give you an average issue
slo=
t
>> > usage 4-6.
>> >
>> > Pragma's are critical, but also critical are usage of type
qualifiers =
like
>> > restrict, const etc..
>> > In addition C6X provide pragma's to align memory elements.
Completely
>> > avoiding unaligned accesses, can be a benefit. In addition,C6X
compile=
r
>> > provides good debug info, (I think you need to turn it on), on
what ex=
actly
>> > can improve the algorithm performance. For example, if there are
exces=
sive
>> > register to memory pills.
>> >
>> >
>> > Regards
>> >
>> > --- On *Sun, 3/29/09, j...@gmail.com <j...@gmail.com>*wr=
ote:
>> >
>> > From: j...@gmail.com <j...@gmail.com>
>> > Subject: [c6x] Is Assembly code/linear assembly code necessary?
>> > To: c...@yahoogroups.com
>> > Date: Sunday, March 29, 2009, 7:41 PM
>> >
>> >
>> > Hi, all
>> >
>> >
>> > Nowadays I am reading the documents about optimization:
>> > spru187o TMS320C6000 Optimizing Compiler v6.1.pdf
>> > spru198i TMS320C6000 Programmer=92s Guide.pdf
>> > spru732c TMS320C64xC64x+ DSP CPU and Instruction Set Reference
Guide.p=
df.
>> >
>> > C64x+ is a special architecture, and instructions has different
latenc=
ies
>> > depending on the type fo inctructions. At first, I think refining
C/C+=
+ code
>> > with pragmas and programming with intrinsics can solve the problem
of
>> > optimization. But I got some ppt from internet. It seems assembly
code=
 is
>> > necessary sometimes.
>> > For your experience, do you think assembly code/linear assembly
code i=
s
>> > necessary?Under what conditions and for what application?
>> >
>> > Thanks in advance.
>> > Jogging
>> >
>
>
> _____________________________________
>

--=20
www.dsprelated.com/blogs-1/nf/Mike_Dunn.php

_____________________________________





(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

RE: Is Assembly code/linear assembly code necessary? - christophe blouet - Apr 23 2:15:17 2009


Ok, sounds good for the Cache enabled, but how big is your Cache? it can change
the results if your program is big. but if it's a small one, once loaded in
Cache you wouldn't see much difference between internal SDRAM ;-) and external
DDR.Really have a look on the C optimisations, by giving a minimum loop number,
the compiler will expand the number of calculations in one loop and then your
code won't suffer of pipeline delay. I got the same results using this method as
the best optimised routine in asm.
Regards

Date: Thu, 23 Apr 2009 09:45:30 +0800
Subject: Re: [c6x] Is Assembly code/linear assembly code necessary?
From: j...@gmail.com
To: c...@hotmail.com
CC: r...@yahoo.com; c...@yahoogroups.com

Hi,
     I assure that external memory is cacheable because I obtain three sets of
figures.
The third set of figure is with cache off on external memory.
IMG_perimeter asm cycle: 28444 
IMG_perimeter c cycle: 298242 
In the function IMG_perimeter needs one input and one output. 
In the test I put them both in internal RAM or in DDR2.

Best Regards
Jogging

On Thu, Apr 23, 2009 at 12:23 AM, christophe blouet <c...@hotmail.com>
wrote:

Hi,

I have some doubts on your figures, are you sure you had Cache enabled when
running in external memory?

Where were the data to process? in internal SDRAM as well?

I wouldn't use the term internal L2 RAM, L2 means Level 2 Cache, internal RAM is
internal RAM, it sounds 2 different things to me.

Normally with some good pragmas and optimise instructions to the compiler you
can get the same result as assembly code, but for far less efforts.

Regards

> To: r...@yahoo.com
> CC: c...@yahoogroups.com
> From: j...@gmail.com

> Date: Wed, 22 Apr 2009 20:38:20 +0800
> Subject: Re: [c6x] Is Assembly code/linear assembly code necessary?
> 
> Hi,
> Thanks for your opinion. I agree with you completely.

> Recently I find memory access may influence the performance more than
> assembly code.
> In order to learn more about the memory access effect, I do some tests.
> I run the IMG_perimeter function from imglib library on DM6437 EVM.

> In the example, test program runs the function in c and then the function
in
> assembly code.
> At first, I put the data in L2 RAM, the resulting time is below:
> IMG_perimeter asm cycle: 1029
> IMG_perimeter c cycle: 2941

> 
> Then I put the data in external memory DDR2, the resulting time is below.
> IMG_perimeter asm cycle: 6250
> IMG_perimeter c cycle: 13234
> 
> We can see that if the data is put in L2 RAM, the time can be reduced

> from 13234 to 2941. It is much better than assembly code optimization
> which reduces time from 13234 to 6250.
> 
> Before I pay my attention to assembly code optimization, and haven't
> found memory access effect.

> 
> My another question is that: memory access latency is multiple cycles in
the
> C64x+ pipeline.
> For load instruction, it needs five cycles to obtain data. If queue or
tree
> data structure is used,

> I don't know how to optimize it. Can anyone share his experience with it?
> 
> Thanks in advance.
> Jogging
> 
> On Tue, Mar 31, 2009 at 1:13 PM, rvsasi <r...@yahoo.com> wrote:

> 
>> Hi,
>> It is difficult to give an answer without a complete understanding of
the
>> real-time deadlines of the system.
>>
>> Let me take a swing at it in the most general fashion.

>>
>> If your are looking for an average issue slot usage of 6-7 or above
(out of
>> a total possible 8) for inner loops/kernel of your algorithm, then
there
>> might be a need for pipelined/linear assembly (more likely that you
would

>> need pipelined assembly). But pipelined assembly takes long time to
develop
>> and difficult to maintain, by an order.
>>
>> Linear assembly is much easier to code and somewhat easier to maintain
than

>> pipelined assembly. Linear assembly has given me outputs of the order 5
to 6
>> (average) for the inner loops. But it is possible that out of 10
cycles, two
>> or three might be at, 4 out of a total of 8 per cycle.

>>
>> Obviously you have to make an initial target mapping analysis of your
>> requirement by mapping the loads/stores and arithmetic of your
algorithm
>> into C6X VLIW instruction set capabilities, keeping in mind all
restrictions

>> of the processor (cross path stalls etc..). If you are using existing
>> library functions, this might become a little difficult. But in general
this
>> analysis gives you a good idea of what is achievable.

>>
>> In general intrinsics with good compiler directives (pragmas),does the
job
>> for most applications. C6X provides very efficient pragmas for
optimizations
>> C6X intrinsics with good pragma's easily give you an average issue
slot

>> usage 4-6.
>>
>> Pragma's are critical, but also critical are usage of type qualifiers
like
>> restrict, const etc..
>> In addition C6X provide pragma's to align memory elements. Completely

>> avoiding unaligned accesses, can be a benefit. In addition,C6X
compiler
>> provides good debug info, (I think you need to turn it on), on what
exactly
>> can improve the algorithm performance. For example, if there are
excessive

>> register to memory pills.
>> Regards
>>
>> --- On *Sun, 3/29/09, j...@gmail.com <j...@gmail.com>*wrote:

>>
>> From: j...@gmail.com <j...@gmail.com>
>> Subject: [c6x] Is Assembly code/linear assembly code necessary?

>> To: c...@yahoogroups.com
>> Date: Sunday, March 29, 2009, 7:41 PM
>> Hi, all
>> Nowadays I am reading the documents about optimization:

>> spru187o TMS320C6000 Optimizing Compiler v6.1.pdf
>> spru198i TMS320C6000 Programmer’s Guide.pdf
>> spru732c TMS320C64xC64x+ DSP CPU and Instruction Set Reference
Guide.pdf.
>>
>> C64x+ is a special architecture, and instructions has different
latencies

>> depending on the type fo inctructions. At first, I think refining C/C++
code
>> with pragmas and programming with intrinsics can solve the problem of
>> optimization. But I got some ppt from internet. It seems assembly code
is

>> necessary sometimes.
>> For your experience, do you think assembly code/linear assembly code
is
>> necessary?Under what conditions and for what application?
>>
>> Thanks in advance.

>> Jogging
>> 
>

> 
> _____________________________________
> 

> 
> Individual Email | Traditional
> 

> 

> 
> http://docs.yahoo.com/info/terms/
> 

______________________________
Start your Android Ice Cream Sandwich development on TI's AM35x Sitara ARM Cortex-A8 processor today.



(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

Re: Is Assembly code/linear assembly code necessary? - jogging song - Apr 23 10:14:47 2009

Hi, Michael
      At first I hope to know the reason why the linear assembly code is
necessary.
I can provide information to c compiler with pragma and restrict.
Intrinsics can be used to instruction selection. So in my opinion linear
assembly code
is not necessary. The benefit of assembly code is instruction selection.
With pragma,
restrict and intrinsics I can implement the most function of assembly code.

I work on optimization for a while, and find memory access is more
important,
because it influences the performance greatly.
So the first step of  the workflow of improving the performance of C should
be
improve memory access pattern.

I have no experience of using DMA on C64x+. Can anyone share his experience
of
using DMA. How does DMA improve the performance. I find DMA is not part of
DSP/BIOS.
 I want to know whether DMA can be used without DSP/BIOS.

Best Regards
Jogging

On Thu, Apr 23, 2009 at 1:32 PM, Michael Dunn <m...@gmail.com>wrote:

> jogging,
>
> On Wed, Apr 22, 2009 at 7:38 AM, jogging song <j...@gmail.com>
> wrote:
> > Hi,
> >    Thanks for your opinion. I agree with you completely.
> > Recently I find memory access may influence the performance more than
> > assembly code.
> > In order to learn more about the memory access effect, I do some
tests.
> > I run the IMG_perimeter function from imglib library on DM6437 EVM.
> > In the example, test program runs the function in c and then the
function
> in
> > assembly code.
> > At first, I put the data in L2 RAM, the resulting time is below:
> > IMG_perimeter asm cycle: 1029
> > IMG_perimeter c cycle: 2941
> >
> > Then I put the data in external memory DDR2, the resulting time is
below.
> > IMG_perimeter asm cycle: 6250
> > IMG_perimeter c cycle: 13234
> >
> > We can see that if the data is put in L2 RAM, the time can be reduced
> > from 13234  to 2941.  It is much better than assembly code
optimization
> > which reduces time from 13234  to 6250.
> >
> > Before I pay my attention to assembly code optimization, and haven't
> > found memory access effect.
> >
> > My another question is that: memory access latency is multiple cycles
in
> the
> > C64x+ pipeline.
> > For load instruction, it needs five cycles to obtain data. If queue
or
> tree
> > data structure is used,
> > I don't know how to optimize it. Can anyone share his experience with
it?
> <mld>
> Check out 'delay slots' and 'load instructions' in spru732c. If you
> look at the assembly code generated by the C compiler, you will
> probably see that it makes use of the delay slots.
> Q1. Are you comparing optimized [by the compiler] C code with assembly
> code??
>
> mikedunn
> >
> > Thanks in advance.
> > Jogging
> >
> > On Tue, Mar 31, 2009 at 1:13 PM, rvsasi <r...@yahoo.com> wrote:
> >
> >> Hi,
> >> It is difficult to give an answer without a complete understanding
of
> the
> >> real-time deadlines  of the system.
> >>
> >> Let me take a swing at it in the most general fashion.
> >>
> >> If your are looking for an average issue slot usage of 6-7 or
above (out
> of
> >> a total possible 8) for inner loops/kernel of your algorithm, then
there
> >> might be a  need for pipelined/linear assembly (more likely that
you
> would
> >> need pipelined assembly). But pipelined assembly takes long time
to
> develop
> >> and difficult to maintain, by an order.
> >>
> >> Linear assembly is much easier to code and somewhat easier to
maintain
> than
> >> pipelined assembly. Linear assembly has given me outputs of the
order 5
> to 6
> >> (average) for the inner loops. But it is possible that out of 10
cycles,
> two
> >> or three might be at, 4 out of a total of 8 per cycle.
> >>
> >> Obviously you have to make an initial target mapping analysis of
your
> >> requirement by mapping the loads/stores and arithmetic of your
algorithm
> >> into C6X VLIW instruction set capabilities, keeping in mind all
> restrictions
> >> of the processor (cross path stalls etc..). If you are using
existing
> >> library functions, this might become a little difficult. But in
general
> this
> >> analysis gives you a good idea of what is achievable.
> >>
> >> In general intrinsics with good compiler directives (pragmas),does
the
> job
> >> for most applications. C6X provides very efficient pragmas for
> optimizations
> >> C6X intrinsics with good pragma's easily give you an average issue
slot
> >> usage 4-6.
> >>
> >> Pragma's are critical, but also critical are usage of type
qualifiers
> like
> >> restrict, const etc..
> >> In addition C6X provide pragma's to align memory elements.
Completely
> >> avoiding unaligned accesses, can be a benefit. In addition,C6X
compiler
> >> provides good debug info, (I think you need to turn it on), on
what
> exactly
> >> can improve the algorithm performance. For example, if there are
> excessive
> >> register to memory pills.
> >>
> >>
> >> Regards
> >>
> >> --- On *Sun, 3/29/09, j...@gmail.com <j...@gmail.com
> >*wrote:
> >>
> >> From: j...@gmail.com <j...@gmail.com>
> >> Subject: [c6x] Is Assembly code/linear assembly code necessary?
> >> To: c...@yahoogroups.com
> >> Date: Sunday, March 29, 2009, 7:41 PM
> >>
> >>
> >> Hi, all
> >>
> >>
> >>  Nowadays I am reading the documents about optimization:
> >> spru187o TMS320C6000 Optimizing Compiler v6.1.pdf
> >> spru198i TMS320C6000 Programmer’s Guide.pdf
> >> spru732c TMS320C64xC64x+ DSP CPU and Instruction Set Reference
> Guide.pdf.
> >>
> >> C64x+ is a special architecture, and instructions has different
> latencies
> >> depending on the type fo inctructions. At first, I think refining
C/C++
> code
> >> with pragmas and programming with intrinsics can solve the problem
of
> >> optimization. But I got some ppt from internet. It seems assembly
code
> is
> >> necessary sometimes.
> >> For your experience, do you think assembly code/linear assembly
code  is
> >> necessary?Under what conditions and for what application?
> >>
> >> Thanks in advance.
> >> Jogging
> >>
> >
> >
> >
> >
> > 
> >
> > _____________________________________
> > 
> >
> >
> >
> > --
> www.dsprelated.com/blogs-1/nf/Mike_Dunn.php
>

______________________________
New Code Sharing Section now Live on DSPRelated.com. Learn about the Reward Program for Contributors here.



(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

Re: Is Assembly code/linear assembly code necessary? - Michael Dunn - Apr 23 22:50:42 2009

jogging,

On Thu, Apr 23, 2009 at 4:21 AM, jogging song <j...@gmail.com> wrote=
:
> Hi, Michael
> =A0=A0=A0=A0=A0 At first I hope to know the reason why the linear assembl=
y code is
> necessary.
<mld>
Maybe you misunderstood. I am not saying that coding in assembly is necessa=
ry.
What is necessary is to understand what assembly code is generated by
the C compiler.
You might effectively optimize C code by carefully using pragmas,
intrinsics, and restrict. IMO, You cannot evaluate the effectiveness
of pragmas, intrinsics, and restrict without looking at before and
after versions of the assembly listing.

> I can provide information to c compiler with pragma and restrict.
> Intrinsics can be used to instruction selection. So in my opinion linear
> assembly code
> is not necessary. The benefit of assembly code is instruction selection.
<mld>
and sequence.

> With pragma,
> restrict and intrinsics I can implement the most function of assembly cod=
e.
>
> I work on optimization for a while, and find memory access is more
> important,
> because it influences the performance greatly.
> So the first step of=A0 the workflow of improving the performance of C sh=
ould
> be
> improve memory access pattern.
>
> I have no experience of using DMA on C64x+. Can anyone share his experien=
ce
> of
> using DMA. How does DMA improve the performance. I find DMA is not part o=
f
> DSP/BIOS.
<mld>
DSP/BIOS supports DMA. Lookup 'Direct Memory Access' at wikipedia.
The short version is that DMA uses a state machine to perform memory
[or peripheral] accesses while the CPU is executing instructions.

mikedunn
> =A0I want to know whether DMA can be used without DSP/BIOS.
>
> Best Regards
> Jogging
>
> On Thu, Apr 23, 2009 at 1:32 PM, Michael Dunn <m...@gmail.com>
> wrote:
>>
>> jogging,
>>
>> On Wed, Apr 22, 2009 at 7:38 AM, jogging song <j...@gmail.com>
>> wrote:
>> > Hi,
>> > =A0 =A0Thanks for your opinion. I agree with you completely.
>> > Recently I find memory access may influence the performance more
than
>> > assembly code.
>> > In order to learn more about the memory access effect, I do some
tests=
.
>> > I run the IMG_perimeter function from imglib library on DM6437
EVM.
>> > In the example, test program runs the function in c and then the
>> > function in
>> > assembly code.
>> > At first, I put the data in L2 RAM, the resulting time is below:
>> > IMG_perimeter asm cycle: 1029
>> > IMG_perimeter c cycle: 2941
>> >
>> > Then I put the data in external memory DDR2, the resulting time
is
>> > below.
>> > IMG_perimeter asm cycle: 6250
>> > IMG_perimeter c cycle: 13234
>> >
>> > We can see that if the data is put in L2 RAM, the time can be
reduced
>> > from 13234 =A0to 2941. =A0It is much better than assembly code
optimiz=
ation
>> > which reduces time from 13234 =A0to 6250.
>> >
>> > Before I pay my attention to assembly code optimization, and
haven't
>> > found memory access effect.
>> >
>> > My another question is that: memory access latency is multiple
cycles =
in
>> > the
>> > C64x+ pipeline.
>> > For load instruction, it needs five cycles to obtain data. If
queue or
>> > tree
>> > data structure is used,
>> > I don't know how to optimize it. Can anyone share his experience
with
>> > it?
>> <mld>
>> Check out 'delay slots' and 'load instructions' in spru732c. If you
>> look at the assembly code generated by the C compiler, you will
>> probably see that it makes use of the delay slots.
>> Q1. Are you comparing optimized [by the compiler] C code with assembly
>> code??
>>
>> mikedunn
>> >
>> > Thanks in advance.
>> > Jogging
>> >
>> > On Tue, Mar 31, 2009 at 1:13 PM, rvsasi <r...@yahoo.com>
wrote:
>> >
>> >> Hi,
>> >> It is difficult to give an answer without a complete
understanding of
>> >> the
>> >> real-time deadlines =A0of the system.
>> >>
>> >> Let me take a swing at it in the most general fashion.
>> >>
>> >> If your are looking for an average issue slot usage of 6-7 or
above
>> >> (out of
>> >> a total possible 8) for inner loops/kernel of your algorithm,
then
>> >> there
>> >> might be a =A0need for pipelined/linear assembly (more likely
that yo=
u
>> >> would
>> >> need pipelined assembly). But pipelined assembly takes long
time to
>> >> develop
>> >> and difficult to maintain, by an order.
>> >>
>> >> Linear assembly is much easier to code and somewhat easier to
maintai=
n
>> >> than
>> >> pipelined assembly. Linear assembly has given me outputs of
the order=
 5
>> >> to 6
>> >> (average) for the inner loops. But it is possible that out of
10
>> >> cycles, two
>> >> or three might be at, 4 out of a total of 8 per cycle.
>> >>
>> >> Obviously you have to make an initial target mapping analysis
of your
>> >> requirement by mapping the loads/stores and arithmetic of
your
>> >> algorithm
>> >> into C6X VLIW instruction set capabilities, keeping in mind
all
>> >> restrictions
>> >> of the processor (cross path stalls etc..). If you are using
existing
>> >> library functions, this might become a little difficult. But
in gener=
al
>> >> this
>> >> analysis gives you a good idea of what is achievable.
>> >>
>> >> In general intrinsics with good compiler directives
(pragmas),does th=
e
>> >> job
>> >> for most applications. C6X provides very efficient pragmas
for
>> >> optimizations
>> >> C6X intrinsics with good pragma's easily give you an average
issue sl=
ot
>> >> usage 4-6.
>> >>
>> >> Pragma's are critical, but also critical are usage of type
qualifiers
>> >> like
>> >> restrict, const etc..
>> >> In addition C6X provide pragma's to align memory elements.
Completely
>> >> avoiding unaligned accesses, can be a benefit. In addition,C6X
compil=
er
>> >> provides good debug info, (I think you need to turn it on), on
what
>> >> exactly
>> >> can improve the algorithm performance. For example, if there
are
>> >> excessive
>> >> register to memory pills.
>> >>
>> >>
>> >> Regards
>> >>
>> >> --- On *Sun, 3/29/09, j...@gmail.com
>> >> <j...@gmail.com>*wrote:
>> >>
>> >> From: j...@gmail.com <j...@gmail.com>
>> >> Subject: [c6x] Is Assembly code/linear assembly code
necessary?
>> >> To: c...@yahoogroups.com
>> >> Date: Sunday, March 29, 2009, 7:41 PM
>> >>
>> >>
>> >> Hi, all
>> >>
>> >>
>> >> =A0Nowadays I am reading the documents about optimization:
>> >> spru187o TMS320C6000 Optimizing Compiler v6.1.pdf
>> >> spru198i TMS320C6000 Programmer=92s Guide.pdf
>> >> spru732c TMS320C64xC64x+ DSP CPU and Instruction Set
Reference
>> >> Guide.pdf.
>> >>
>> >> C64x+ is a special architecture, and instructions has
different
>> >> latencies
>> >> depending on the type fo inctructions. At first, I think
refining C/C=
++
>> >> code
>> >> with pragmas and programming with intrinsics can solve the
problem of
>> >> optimization. But I got some ppt from internet. It seems
assembly cod=
e
>> >> is
>> >> necessary sometimes.
>> >> For your experience, do you think assembly code/linear
assembly code
>> >> =A0is
>> >> necessary?Under what conditions and for what application?
>> >>
>> >> Thanks in advance.
>> >> Jogging
>> >>
>> >
>> >
>> >
>> >
>> > 
>> >
>> > _____________________________________
>> > 
>> >
>> >
>> >
>> >
>>
>> --
>> www.dsprelated.com/blogs-1/nf/Mike_Dunn.php

--=20
www.dsprelated.com/blogs-1/nf/Mike_Dunn.php

_____________________________________

______________________________
New Code Sharing Section now Live on DSPRelated.com. Learn about the Reward Program for Contributors here.



(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

Fir filter with high stop band attenuation - Ramaraju SVS - Apr 25 12:26:36 2009

Hi ,
I am facing some problem with filter implementation. 
Sampling frequency of my ADC =2560hz . ADC will give data which have 
frequency range from 0-5120hz. Because of hardware limitation I cannot 
down sample below 2560hz. But signal of interest is 0-to 5 0hz. So after 
acquiring data from adc I am implementing fir filter whose sampling 
frequency is 10240hz, and cutoff frequency 50hz. After that I am 
decimating filtered data by 20 times. I  mean to say that actually I am 
over sampling the signal 20 times.
Fs= 2560
Fc=50hz
Decimation factor =20;
So effective sampling frequency is = 2560/20= 128hz
My fft  spectrum show frequencies range from 0-64hz.
According my system requirement anything above -80db will be considered to 
the signal. In order to achieve this, I should select my filter stop band 
attenuation >90db.
But conventional FIR filter cannot give this much attenuation. 
Can anyone suggest some technique to achieve this..
Thanks in advance,
Regards,
Ramaraju
______________________________
Start your Android Ice Cream Sandwich development on TI's AM35x Sitara ARM Cortex-A8 processor today.



(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )

RE: Fir filter with high stop band attenuation - christophe blouet - Apr 27 16:11:51 2009

Why don't you cascade 2 filters?
Are you sure you have enough dynamic to achieve more than 90dB after
computation?

To: c...@yahoogroups.com
From: r...@lntemsys.com
Date: Sat, 25 Apr 2009 17:18:17 +0530
Subject: [c6x] Fir filter with high stop band attenuation
Hi ,
I am facing some problem with filter
implementation. 

Sampling frequency of my ADC =2560hz
. ADC will give data which have frequency range from 0-5120hz. Because
of hardware limitation I cannot down sample below 2560hz. But signal of
interest is 0-to 5 0hz. So after acquiring data from adc I am implementing
fir filter whose sampling frequency is 10240hz, and cutoff frequency 50hz.
After that I am decimating filtered data by 20 times. I  mean to say
that actually I am over sampling the signal 20 times.

Fs= 2560

Fc=50hz

Decimation factor =20;

So effective sampling frequency is =
2560/20= 128hz

My fft  spectrum show frequencies
range from 0-64hz.

According my system requirement anything
above -80db will be considered to the signal. In order to achieve this,
I should select my filter stop band attenuation >90db.

But conventional FIR filter cannot give
this much attenuation. 

Can anyone suggest some technique to
achieve this..

Thanks in advance,

Regards,

Ramaraju

_____________________________________

______________________________
Start your Android Ice Cream Sandwich development on TI's AM35x Sitara ARM Cortex-A8 processor today.



(You need to be a member of c6x -- send a blank email to c6x-subscribe@yahoogroups.com )