Forums

ADSP 21160 questions on example

Started by Marc Finet June 23, 2003
I'm a newbie on DSP, so I got this example on the analog site
(ftp://ftp.analog.com/pub/dsp/2116x/examples/simd_single_channel/adsp-2116x_vector_maximum.zip):


/*
[...]
Description:
			Subroutine that records the value and location of
                  the maximum value in a given (real) vector.

                  Equation:   MAX_VAL = MAX[INPUT]
                              MAX_INDEX = i given MAX_VAL = INPUT(i)

Calling Parameters:
			b0,i0 = address of input data
			l0 = 0
			r1 = number of samples / 4
			
Assumptions:
			All arrays must start on even normal-word address boundaries.
			All arrays must have a multiple of 4 length (pad with zeros if
necessary).
			The integer values 4 & 3 should be appended to the 
			end of the input file as offsets to find MAX_INDEX.
			

Return Values:
			f4=MAX_VAL
			f2=MAX_INDEX
[...] */


vec_max:
            /* alu, multiplier precision, SIMD mode enable */
            bit set MODE1 RND32 | PEYEN;
            nop;

            f4=dm(i0,2);
            f0=pass f4;
            lcntr=r1, do vecmax until lce;	/* vector maximum loop */
            comp(f4,f0), f8=dm(i0,2);	      /* last read is appended
offset for vector location */
            if le f4=pass f0, r2=i0;
            comp(f4,f8), f0=dm(i0,2);
vecmax:	    if le f4=pass f8, r2=i0;

            r2=r2-r0, r0=b0;                    
            r2=r2-r0, r0=s4;                    
            comp(f4,f0);                        
            if le f4=pass f0, r2<->s2;
            rts (db);
            bit clr MODE1 PEYEN;
            nop;

I understood the algo, but i wanted to know different things : 
- would this algo without the RND32 ? ('cos PEx/Pey transfer at end
(r0=s4) are 32 bit wide).
- how the 4 and 3 integers are put in memory ? with a r0=4;
dm(i0,0)=f0 ?
- could we improve (no overhead) this algo like that ? (only end
changed)
            r2=r2-r0, r0=b0;                    
            bit clr MODE1 PEYEN;
            r2=r2-r0, r0=s4;           /* always in SIMD mode */
            rts (db);                  /* 2 instructions */
            comp(f4,f0);               /* f4 = max for PEx, f0=max for
PEy */
            if le f4=pass f0, r2<->s2; /* effective rts */

- and finally are dsp algo always so subtle ?
"Marc Finet" <marcfinet@netcourrier.com> wrote in message
news:61f4baf2.0306230042.7a45faae@posting.google.com...
> I'm a newbie on DSP, so I got this example on the analog site >
(ftp://ftp.analog.com/pub/dsp/2116x/examples/simd_single_channel/adsp-2116x_ vector_maximum.zip):
> > > /* > [...] > Description: > Subroutine that records the value and location of > the maximum value in a given (real) vector. > > Equation: MAX_VAL = MAX[INPUT] > MAX_INDEX = i given MAX_VAL = INPUT(i) > > Calling Parameters: > b0,i0 = address of input data > l0 = 0 > r1 = number of samples / 4 > > Assumptions: > All arrays must start on even normal-word address boundaries. > All arrays must have a multiple of 4 length (pad with zeros if > necessary). > The integer values 4 & 3 should be appended to the > end of the input file as offsets to find MAX_INDEX. > > > Return Values: > f4=MAX_VAL > f2=MAX_INDEX > [...] */ > > > vec_max: > /* alu, multiplier precision, SIMD mode enable */ > bit set MODE1 RND32 | PEYEN; > nop; > > f4=dm(i0,2); > f0=pass f4; > lcntr=r1, do vecmax until lce; /* vector maximum loop */ > comp(f4,f0), f8=dm(i0,2); /* last read is appended > offset for vector location */ > if le f4=pass f0, r2=i0; > comp(f4,f8), f0=dm(i0,2); > vecmax: if le f4=pass f8, r2=i0; > > r2=r2-r0, r0=b0; > r2=r2-r0, r0=s4; > comp(f4,f0); > if le f4=pass f0, r2<->s2; > rts (db); > bit clr MODE1 PEYEN; > nop; > > I understood the algo, but i wanted to know different things : > - would this algo without the RND32 ? ('cos PEx/Pey transfer at end > (r0=s4) are 32 bit wide).
I think it would truncate to 32-bits, rather than rounding. May or may not be a significant difference, depending on your application.
> - how the 4 and 3 integers are put in memory ? with a r0=4; > dm(i0,0)=f0 ?
That should work just fine.
> - could we improve (no overhead) this algo like that ? (only end > changed) > r2=r2-r0, r0=b0; > bit clr MODE1 PEYEN; > r2=r2-r0, r0=s4; /* always in SIMD mode */ > rts (db); /* 2 instructions */ > comp(f4,f0); /* f4 = max for PEx, f0=max for > PEy */ > if le f4=pass f0, r2<->s2; /* effective rts */
Probably should simulate this to be sure. I'm not sure if the disabling of PEYEN would take effect immediately and then mess up the next instruction (I'm not an expert on the SIMD stuff). Keep in mind though, that you are only talking about saving 1 instruction at the clean-up of the function call. For any significant vector size, the percentage saving would be miniscule. Usually, one tries to put all the work of optimization into the inner loops where large savings are possible. Squeezing a cycle here or there out of code that is relatively infrequently executed would be a last step if more performance was needed.
> - and finally are dsp algo always so subtle ?
The algorithms themselves are not always so subtle. Fully hand-optimized DSP assembly language code on the other hand usually is. There are lots of "tricks" played to squeeze out every last drop of performance. Code like this is usually difficult to read and maintain, but it often must be written this way for maximum performance. Sometimes studying optimized code is a tough way to learn an algorithm.
> Probably should simulate this to be sure. I'm not sure if the disabling of > PEYEN would take effect immediately and then mess up the next instruction > (I'm not an expert on the SIMD stuff).
That's the point. Writings to MODE1 take 2 cycles. So, insert a NOP is a good recommendation. Or insertig another instruction which doesn't rely o the MODe1 register.
> > Keep in mind though, that you are only talking about saving 1 instruction at > the clean-up of the function call. For any significant vector size, the > percentage saving would be miniscule. Usually, one tries to put all the > work of optimization into the inner loops where large savings are possible. > Squeezing a cycle here or there out of code that is relatively infrequently > executed would be a last step if more performance was needed. > > > - and finally are dsp algo always so subtle ? > > The algorithms themselves are not always so subtle. Fully hand-optimized > DSP assembly language code on the other hand usually is. There are lots of > "tricks" played to squeeze out every last drop of performance. Code like > this is usually difficult to read and maintain, but it often must be written > this way for maximum performance. Sometimes studying optimized code is a > tough way to learn an algorithm.
The Tue, 24 Jun 2003 10:46:20 -0700 "Jon Harris"
<jon_harrisTIGER@hotmail.com> wrote

> Probably should simulate this to be sure. I'm not sure if the > disabling of PEYEN would take effect immediately and then mess up the > next instruction(I'm not an expert on the SIMD stuff).
I only have the doc, neither proc, nor emulator/simulator. Does software-only simulator/emulator exist (i.e. without the chip linked to PC, like emulator seen on www.analog.com, using JTAG port) ?
> Keep in mind though, that you are only talking about saving 1 > instruction at the clean-up of the function call. For any significant > vector size, the percentage saving would be miniscule. Usually, one > tries to put all the work of optimization into the inner loops where > large savings are possible. Squeezing a cycle here or there out of > code that is relatively infrequently executed would be a last step if > more performance was needed.
With the new environement the dsp is for me, i forgive the "elementary" coding rules. In fact, it was only in order to understand the processes. Thanks for fast and complete answer. Marc Finet
marc-f wrote:
> > I only have the doc, neither proc, nor emulator/simulator. Does > software-only simulator/emulator exist (i.e. without the chip linked to > PC, like emulator seen on www.analog.com, using JTAG port) ?
Yes. ADI has various trial versions of VisualDSP++ available for download on their website. Sometimes they offer a 30-day version, sometimes they offer a crippled-but-never-expires version. Sometimes they offer both. But whatever their trial-version-du-jour, it will include a simulator. -- Jim Thomas Principal Applications Engineer Bittware, Inc jthomas@bittware.com http://www.bittware.com (703) 779-7770 Visualize whirled peas.