Thanks for giving the direction. I can't vectorize the loop as I using SISD processor (21060). Anyways, I have implemented the same loop in asm function and calling it from C program. I am saving 10,000+ cycle by this... thats what I wanted to do. Thanks again, regards Liyju --- Mike Rosing <> wrote: > On Sun, 9 May 2004, Liyju Janardhan wrote: > > > > > Following is the loop which I want to optimize. > > > > for (i=0;i<(num/2);i++) > > { > > o = r_out[num-i] + r_out[i]; > > x[i] = o*o; > > > > o=i_out[num-i] + i_out[i]; > > y[i] = o*o; > > } > > > > r_out, i_out, x and are in data memory. o is a > local > > variable hence stored in stack. Putting o in > program > > memory may increase the speed. > > o is a temp, let the compiler leave it as a register > for speed. > > > There are some loop optimization pragmas, how > > are they used? > > What does it mean by vectorizing loop? > > The modern SHARC's have 2 ALU's. A vector loop uses > the same operation > on 2 different sets of data (Single instruction, > Multiple data == SIMD) > In one alu you perform the x calculation and in the > other alu you perform > the y calculation. > > Getting a compiler to see this kind of optimization > is really hard. > Usually you must do it by hand. Fortunatly it's > pretty easy for this > problem. You first have to set up the alu's so all > the pointers are > correct, then let 'em rip (as the beyblade kids > say). > > Patience, persistence, truth, > Dr. mike > __________________________________ |