comp.dsp | ADSP 21160 questions on example

I'm a newbie on DSP, so I got this example on the analog site
(ftp://ftp.analog.com/pub/dsp/2116x/examples/simd_single_channel/adsp-2116x_vector_maximum.zip):


/*
[...]
Description:
			Subroutine that records the value and location of
                  the maximum value in a given (real) vector.

                  Equation:   MAX_VAL = MAX[INPUT]
                              MAX_INDEX = i given MAX_VAL = INPUT(i)

Calling Parameters:
			b0,i0 = address of input data
			l0 = 0
			r1 = number of samples / 4
			
Assumptions:
			All arrays must start on even normal-word address boundaries.
			All arrays must have a multiple of 4 length (pad with zeros if
necessary).
			The integer values 4 & 3 should be appended to the 
			end of the input file as offsets to find MAX_INDEX.
			

Return Values:
			f4=MAX_VAL
			f2=MAX_INDEX
[...] */


vec_max:
            /* alu, multiplier precision, SIMD mode enable */
            bit set MODE1 RND32 | PEYEN;
            nop;

            f4=dm(i0,2);
            f0=pass f4;
            lcntr=r1, do vecmax until lce;	/* vector maximum loop */
            comp(f4,f0), f8=dm(i0,2);	      /* last read is appended
offset for vector location */
            if le f4=pass f0, r2=i0;
            comp(f4,f8), f0=dm(i0,2);
vecmax:	    if le f4=pass f8, r2=i0;

            r2=r2-r0, r0=b0;                    
            r2=r2-r0, r0=s4;                    
            comp(f4,f0);                        
            if le f4=pass f0, r2<->s2;
            rts (db);
            bit clr MODE1 PEYEN;
            nop;

I understood the algo, but i wanted to know different things : 
- would this algo without the RND32 ? ('cos PEx/Pey transfer at end
(r0=s4) are 32 bit wide).
- how the 4 and 3 integers are put in memory ? with a r0=4;
dm(i0,0)=f0 ?
- could we improve (no overhead) this algo like that ? (only end
changed)
            r2=r2-r0, r0=b0;                    
            bit clr MODE1 PEYEN;
            r2=r2-r0, r0=s4;           /* always in SIMD mode */
            rts (db);                  /* 2 instructions */
            comp(f4,f0);               /* f4 = max for PEx, f0=max for
PEy */
            if le f4=pass f0, r2<->s2; /* effective rts */

- and finally are dsp algo always so subtle ?

Reply by Jon Harris ●June 24, 20032003-06-24

"Marc Finet" <marcfinet@netcourrier.com> wrote in message
news:61f4baf2.0306230042.7a45faae@posting.google.com...
> I'm a newbie on DSP, so I got this example on the analog site
>
(ftp://ftp.analog.com/pub/dsp/2116x/examples/simd_single_channel/adsp-2116x_
vector_maximum.zip):
>
>
> /*
> [...]
> Description:
> Subroutine that records the value and location of
>                   the maximum value in a given (real) vector.
>
>                   Equation:   MAX_VAL = MAX[INPUT]
>                               MAX_INDEX = i given MAX_VAL = INPUT(i)
>
> Calling Parameters:
> b0,i0 = address of input data
> l0 = 0
> r1 = number of samples / 4
>
> Assumptions:
> All arrays must start on even normal-word address boundaries.
> All arrays must have a multiple of 4 length (pad with zeros if
> necessary).
> The integer values 4 & 3 should be appended to the
> end of the input file as offsets to find MAX_INDEX.
>
>
> Return Values:
> f4=MAX_VAL
> f2=MAX_INDEX
> [...] */
>
>
> vec_max:
>             /* alu, multiplier precision, SIMD mode enable */
>             bit set MODE1 RND32 | PEYEN;
>             nop;
>
>             f4=dm(i0,2);
>             f0=pass f4;
>             lcntr=r1, do vecmax until lce; /* vector maximum loop */
>             comp(f4,f0), f8=dm(i0,2);       /* last read is appended
> offset for vector location */
>             if le f4=pass f0, r2=i0;
>             comp(f4,f8), f0=dm(i0,2);
> vecmax:     if le f4=pass f8, r2=i0;
>
>             r2=r2-r0, r0=b0;
>             r2=r2-r0, r0=s4;
>             comp(f4,f0);
>             if le f4=pass f0, r2<->s2;
>             rts (db);
>             bit clr MODE1 PEYEN;
>             nop;
>
> I understood the algo, but i wanted to know different things :
> - would this algo without the RND32 ? ('cos PEx/Pey transfer at end
> (r0=s4) are 32 bit wide).

I think it would truncate to 32-bits, rather than rounding.  May or may not
be a significant difference, depending on your application.

> - how the 4 and 3 integers are put in memory ? with a r0=4;
> dm(i0,0)=f0 ?

That should work just fine.

> - could we improve (no overhead) this algo like that ? (only end
> changed)
>             r2=r2-r0, r0=b0;
>             bit clr MODE1 PEYEN;
>             r2=r2-r0, r0=s4;           /* always in SIMD mode */
>             rts (db);                  /* 2 instructions */
>             comp(f4,f0);               /* f4 = max for PEx, f0=max for
> PEy */
>             if le f4=pass f0, r2<->s2; /* effective rts */

Probably should simulate this to be sure.  I'm not sure if the disabling of
PEYEN would take effect immediately and then mess up the next instruction
(I'm not an expert on the SIMD stuff).

Keep in mind though, that you are only talking about saving 1 instruction at
the clean-up of the function call.  For any significant vector size, the
percentage saving would be miniscule.  Usually, one tries to put all the
work of optimization into the inner loops where large savings are possible.
Squeezing a cycle here or there out of code that is relatively infrequently
executed would be a last step if more performance was needed.

> - and finally are dsp algo always so subtle ?

The algorithms themselves are not always so subtle.  Fully hand-optimized
DSP assembly language code on the other hand usually is.  There are lots of
"tricks" played to squeeze out every last drop of performance.  Code like
this is usually difficult to read and maintain, but it often must be written
this way for maximum performance.  Sometimes studying optimized code is a
tough way to learn an algorithm.

Reply by Jaime Andres Aranguren Cardona ●June 26, 20032003-06-26

> Probably should simulate this to be sure.  I'm not sure if the disabling of
> PEYEN would take effect immediately and then mess up the next instruction
> (I'm not an expert on the SIMD stuff).

That's the point. Writings to MODE1 take 2 cycles. So, insert a NOP is
a good recommendation. Or insertig another instruction which doesn't
rely o the MODe1 register.

> 
> Keep in mind though, that you are only talking about saving 1 instruction at
> the clean-up of the function call.  For any significant vector size, the
> percentage saving would be miniscule.  Usually, one tries to put all the
> work of optimization into the inner loops where large savings are possible.
> Squeezing a cycle here or there out of code that is relatively infrequently
> executed would be a last step if more performance was needed.
> 
> > - and finally are dsp algo always so subtle ?
> 
> The algorithms themselves are not always so subtle.  Fully hand-optimized
> DSP assembly language code on the other hand usually is.  There are lots of
> "tricks" played to squeeze out every last drop of performance.  Code like
> this is usually difficult to read and maintain, but it often must be written
> this way for maximum performance.  Sometimes studying optimized code is a
> tough way to learn an algorithm.

Reply by marc-f ●June 29, 20032003-06-29

The Tue, 24 Jun 2003 10:46:20 -0700 "Jon Harris"
<jon_harrisTIGER@hotmail.com> wrote

> Probably should simulate this to be sure.  I'm not sure if the
> disabling of PEYEN would take effect immediately and then mess up the
> next instruction(I'm not an expert on the SIMD stuff).

I only have the doc, neither proc, nor emulator/simulator. Does
software-only simulator/emulator exist (i.e. without the chip linked to
PC, like emulator seen on www.analog.com, using JTAG port) ? 


 
> Keep in mind though, that you are only talking about saving 1
> instruction at the clean-up of the function call.  For any significant
> vector size, the percentage saving would be miniscule.  Usually, one
> tries to put all the work of optimization into the inner loops where
> large savings are possible. Squeezing a cycle here or there out of
> code that is relatively infrequently executed would be a last step if
> more performance was needed.

With the new environement the dsp is for me, i forgive the "elementary"
coding rules. In fact, it was only in order to understand the processes.


Thanks for fast and complete answer. 

Marc Finet

Reply by Jim Thomas ●June 29, 20032003-06-29

marc-f wrote:
> 
> I only have the doc, neither proc, nor emulator/simulator. Does
> software-only simulator/emulator exist (i.e. without the chip linked to
> PC, like emulator seen on www.analog.com, using JTAG port) ?

Yes.  ADI has various trial versions of VisualDSP++ available for
download on their website.  Sometimes they offer a 30-day version,
sometimes they offer a crippled-but-never-expires version.  Sometimes
they offer both.  But whatever their trial-version-du-jour, it will
include a simulator.

-- 
Jim Thomas            Principal Applications Engineer  Bittware, Inc
jthomas@bittware.com  http://www.bittware.com          (703) 779-7770
Visualize whirled peas.

ADSP 21160 questions on example

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group