Hello, after trying to find out why it doesn't work for several weeks, I finally think I should try to find a solution with your help. Here is my problem: We calculate a climate modell, which consists of many array, the size of the array depends on the resolution beeing choosed. MAIN.f: implicit double precision (a-h,o-z) external tend include 'RESOLUTION' .. more variables common/coplan/iplanback(maxthr),iplanforw(maxthr) data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/ c c++++ destroy plans for fft routines call clean_plan .. a lot of reading and pre-calculations for startup time = 0d0 dtime = 0.2d0*step do time = 0, endtime call tendencies(time,ym2,dydt) time = time + dtime enddo .. subroutine tend(time,y,dydt) USE OMP_lib implicit double precision (a-h,o-z) external intend include 'RESOLUTION' .. other variables !$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij) !$OMP& SHARED(createbackplans,createforwplans) nthr=OMP_GET_NUM_THREADS() idthr=OMP_GET_THREAD_NUM()+1 !$OMP SINGLE numthr=nthr !$OMP END SINGLE !$OMP DO do 80 ij = 1,kh jlat = 2*(ij-1)+1 xij=x(ij) call int(jlat,xij,w1,idthr) jlat = 2*(ij-1)+2 xij=-x(ij) call int(jlat,xij,w2,idthr) do iy = 1,ny work(iy,idthr) = work(iy,idthr) * + w(ij)*(w1(iy)+w2(iy) ) end do 80 continue !$OMP END DO !$OMP END PARALLEL do ij=1,numthr do 60 iy = 1,ny 60 dydt(iy) = dydt(iy) + work(iy,ij) end do return end subroutine tend subroutine int(jlat,x,w,idthr) USE OMP_lib implicit double precision (a-h,o-z) include 'RESOLUTION' dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or ys. .. other variables common/coplan/iplanback(maxthr),iplanforw(maxthr) .. initialization of Fxs(:,:) if (iplanback(idthr) .eq. 0 ) then !$OMP Critical (createbackplans) call create_fftback_plan(Fx,idthr) !$OMP End Critical (createbackplans) iplanrueck(idthr) = 1 endif call fftback( Fx, x, idthr) .. many more call fftback(Fx., x., idthr) .. then some calculations to xs if (iplanforw(idthr) .eq. 0 ) then !$OMP Critical (createforwplans) call create_fftforw_plan(Fx,idthr) !$OMP End Critical (createforwplans) iplanforw(idthr) = 1 endif call fftforw( y, Fy, idthr) .. many more call fftforw(Fy., y., idthr) end subroutine int subroutine create_fftback_plan(Fx,idthr) USE OMP_lib implicit double precision (a-h,o-z) INCLUDE '../include/fftw3.f' include 'RESOLUTION' double precision store(lev,1:irie),ffty(lev,0:2*nz) .. other variables for using fftw common/plancs/plan(4,maxthr) .. some factorization to Fx and transmitting it to store call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple, & store,inembed, istride, idist,store,onembed, & ostride, odist, FFTW_HC2R, FFTW_MEASURE) end subroutine create_fftback_plan Similar to this is the subroutine create_fftforw_plan(Fx,idthr). subroutine fftback( Fx, x, idthr) USE OMP_lib implicit double precision (a-h,o-z) INCLUDE '../include/fftw3.f' include 'RESOLUTION' double precision store(lev,1:irie),ffty(lev,0:2*nz), & Yfftrueck(irie,lev) integer*8 plan common/plancs/plan(4,maxthr) .. factorize the input and transmitting to store call dfftw_execute_r2r(plan(3, idthr),store,store) .. factorize the output and transmitting to x end subroutine fftback Again similar is subroutine fftforw( y, Fy, idthr). And now the following problem happens. If I run the program many times with only on thread, it always produces the same output. But when I start it with 2 or more threads it results of the fftw-routines differ not all of them but many. And because other calculations or done later, the error increases. If run the program with a rough resolution, the errors are not significant but with a fine resolution they do big trouble after few steps I get only Zeros or NaNs. What I figgered out already before, I was doing the plannig for fftw before entering the parallel loop, but then I found out for example the results from 5 threads were fine (no difference) but for the other 3 threads the differed, thats why I planned for each thread separately. And now I have no more idea what I could or should do different. So if anybody can help me, I would appreciate it. And sorry for the long statement, I tried to put in all what could be of interest. Thanks Corinna
fftw+openmp in Fortran
Started by ●November 26, 2010
Reply by ●November 27, 20102010-11-27
On Nov 27, 4:40�am, "neophyte" <schuett@n_o_s_p_a_m.iap-kborn.de> wrote:> Hello, > > after trying to find out why it doesn't work for several weeks, I finally > think I should try to find a solution with your help. > Here is my problem: > We calculate a climate modell, which consists of many array, the size of > the array depends on the resolution beeing choosed. > MAIN.f: > � � � implicit double precision (a-h,o-z) > � � � external tend > � � � include 'RESOLUTION' � > .. more variables � > � � � common/coplan/iplanback(maxthr),iplanforw(maxthr) > � � � data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/ > c > c++++ destroy plans for fft routines > � � � call clean_plan > > .. a lot of reading and pre-calculations for startup > > � � � time �= 0d0 > � � � dtime = 0.2d0*step > � � � do time = 0, endtime > � � � call tendencies(time,ym2,dydt) > � � � time = time + dtime > � � � enddo > .. > > subroutine tend(time,y,dydt) > � � � USE OMP_lib > � � � implicit double precision (a-h,o-z) > � � � external intend > � � � include 'RESOLUTION' > .. other variables > > !$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij) > !$OMP& SHARED(createbackplans,createforwplans) > � � � nthr=OMP_GET_NUM_THREADS() > � � � idthr=OMP_GET_THREAD_NUM()+1 > !$OMP SINGLE > � � � numthr=nthr > !$OMP END SINGLE > !$OMP DO > � � � do 80 ij = 1,kh > � � � jlat = 2*(ij-1)+1 > � � � xij=x(ij) > � � � call int(jlat,xij,w1,idthr) > � � � jlat = 2*(ij-1)+2 > � � � xij=-x(ij) � > � � � call int(jlat,xij,w2,idthr) > � � � do iy = 1,ny > � � � work(iy,idthr) = work(iy,idthr) > � � �* � � � � � � � + w(ij)*(w1(iy)+w2(iy) ) > � � � end do > 80 � �continue > !$OMP END DO > !$OMP END PARALLEL > � � � do ij=1,numthr > � � � do 60 iy = 1,ny > � 60 �dydt(iy) = dydt(iy) + work(iy,ij) > � � � end do > � � � return > � � � end subroutine tend > > subroutine �int(jlat,x,w,idthr) > � � � USE OMP_lib > � � � implicit double precision (a-h,o-z) > � � � include 'RESOLUTION' > � � � dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or > ys. > .. other variables > � � � common/coplan/iplanback(maxthr),iplanforw(maxthr) > > .. initialization of Fxs(:,:) > > � � � if (iplanback(idthr) .eq. 0 ) �then > !$OMP Critical (createbackplans) > � � � � �call create_fftback_plan(Fx,idthr) > !$OMP End Critical (createbackplans) > � � � � �iplanrueck(idthr) = 1 > � � � endif > � � � call fftback( Fx, x, idthr) > .. many more call fftback(Fx., x., idthr) > .. then some calculations to xs > � � � if (iplanforw(idthr) .eq. 0 ) �then > !$OMP Critical (createforwplans) > � � � � �call create_fftforw_plan(Fx,idthr) > !$OMP End Critical (createforwplans) > � � � � �iplanforw(idthr) = 1 > � � � endif > � � � call fftforw( y, Fy, idthr) > .. many more call fftforw(Fy., y., idthr) > � � � end subroutine int > > subroutine create_fftback_plan(Fx,idthr) > � � � USE OMP_lib > � � � implicit double precision (a-h,o-z) > � � � INCLUDE '../include/fftw3.f' > � � � include 'RESOLUTION' > � � � double precision store(lev,1:irie),ffty(lev,0:2*nz) > .. other variables for using fftw > � � � common/plancs/plan(4,maxthr) > > .. some factorization to Fx and transmitting it to store > � � � call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple, > � � �& � � � store,inembed, istride, idist,store,onembed, > � � �& � � � ostride, odist, FFTW_HC2R, FFTW_MEASURE) > � � � end subroutine create_fftback_plan > > Similar to this is the subroutine create_fftforw_plan(Fx,idthr). > > subroutine fftback( Fx, x, idthr) > � � � USE OMP_lib > � � � implicit double precision (a-h,o-z) > � � � INCLUDE '../include/fftw3.f' > � � � include 'RESOLUTION' > � � � double precision store(lev,1:irie),ffty(lev,0:2*nz), > � � �& � � � � � � � � Yfftrueck(irie,lev) > � � � integer*8 plan > � � � common/plancs/plan(4,maxthr) > .. factorize the input and transmitting to store > � � � call dfftw_execute_r2r(plan(3, idthr),store,store) > .. factorize the output and transmitting to x > � � � end subroutine fftback > > Again similar is subroutine fftforw( y, Fy, idthr). > > And now the following problem happens. If I run the program many times with > only on thread, it always produces the same output. But when I start it > with 2 or more threads it results of the fftw-routines differ not all of > them but many. And because other calculations or done later, the error > increases. If run the program with a rough resolution, the errors are not > significant but with a fine resolution they do big trouble after few steps > I get only Zeros �or NaNs. What I figgered out already before, I was doing > the plannig for fftw before entering the parallel loop, but then I found > out for example the results from 5 threads were fine (no difference) but > for the other 3 threads the differed, thats why I planned for each thread > separately. > And now I have no more idea what I could or should do different. > > So if anybody can help me, I would appreciate it. And sorry for the long > statement, I tried to put in all what could be of interest. > Thanks CorinnaI once used Fortran - about 30 odd years ago! Bloody hell man - get with it! Climate model? I assume it must be BS in thaty case. Hardy
Reply by ●November 29, 20102010-11-29
On Nov 26, 3:40�pm, "neophyte" <schuett@n_o_s_p_a_m.iap-kborn.de> wrote:> Hello, > > after trying to find out why it doesn't work for several weeks, I finally > think I should try to find a solution with your help. > Here is my problem: > We calculate a climate modell, which consists of many array, the size of > the array depends on the resolution beeing choosed. > MAIN.f: > � � � implicit double precision (a-h,o-z) > � � � external tend > � � � include 'RESOLUTION' � > .. more variables � > � � � common/coplan/iplanback(maxthr),iplanforw(maxthr) > � � � data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/ > c > c++++ destroy plans for fft routines > � � � call clean_plan > > .. a lot of reading and pre-calculations for startup > > � � � time �= 0d0 > � � � dtime = 0.2d0*step > � � � do time = 0, endtime > � � � call tendencies(time,ym2,dydt) > � � � time = time + dtime > � � � enddo > .. > > subroutine tend(time,y,dydt) > � � � USE OMP_lib > � � � implicit double precision (a-h,o-z) > � � � external intend > � � � include 'RESOLUTION' > .. other variables > > !$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij) > !$OMP& SHARED(createbackplans,createforwplans) > � � � nthr=OMP_GET_NUM_THREADS() > � � � idthr=OMP_GET_THREAD_NUM()+1 > !$OMP SINGLE > � � � numthr=nthr > !$OMP END SINGLE > !$OMP DO > � � � do 80 ij = 1,kh > � � � jlat = 2*(ij-1)+1 > � � � xij=x(ij) > � � � call int(jlat,xij,w1,idthr) > � � � jlat = 2*(ij-1)+2 > � � � xij=-x(ij) � > � � � call int(jlat,xij,w2,idthr) > � � � do iy = 1,ny > � � � work(iy,idthr) = work(iy,idthr) > � � �* � � � � � � � + w(ij)*(w1(iy)+w2(iy) ) > � � � end do > 80 � �continue > !$OMP END DO > !$OMP END PARALLEL > � � � do ij=1,numthr > � � � do 60 iy = 1,ny > � 60 �dydt(iy) = dydt(iy) + work(iy,ij) > � � � end do > � � � return > � � � end subroutine tend > > subroutine �int(jlat,x,w,idthr) > � � � USE OMP_lib > � � � implicit double precision (a-h,o-z) > � � � include 'RESOLUTION' > � � � dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or > ys. > .. other variables > � � � common/coplan/iplanback(maxthr),iplanforw(maxthr) > > .. initialization of Fxs(:,:) > > � � � if (iplanback(idthr) .eq. 0 ) �then > !$OMP Critical (createbackplans) > � � � � �call create_fftback_plan(Fx,idthr) > !$OMP End Critical (createbackplans) > � � � � �iplanrueck(idthr) = 1 > � � � endif > � � � call fftback( Fx, x, idthr) > .. many more call fftback(Fx., x., idthr) > .. then some calculations to xs > � � � if (iplanforw(idthr) .eq. 0 ) �then > !$OMP Critical (createforwplans) > � � � � �call create_fftforw_plan(Fx,idthr) > !$OMP End Critical (createforwplans) > � � � � �iplanforw(idthr) = 1 > � � � endif > � � � call fftforw( y, Fy, idthr) > .. many more call fftforw(Fy., y., idthr) > � � � end subroutine int > > subroutine create_fftback_plan(Fx,idthr) > � � � USE OMP_lib > � � � implicit double precision (a-h,o-z) > � � � INCLUDE '../include/fftw3.f' > � � � include 'RESOLUTION' > � � � double precision store(lev,1:irie),ffty(lev,0:2*nz) > .. other variables for using fftw > � � � common/plancs/plan(4,maxthr) > > .. some factorization to Fx and transmitting it to store > � � � call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple, > � � �& � � � store,inembed, istride, idist,store,onembed, > � � �& � � � ostride, odist, FFTW_HC2R, FFTW_MEASURE) > � � � end subroutine create_fftback_plan > > Similar to this is the subroutine create_fftforw_plan(Fx,idthr). > > subroutine fftback( Fx, x, idthr) > � � � USE OMP_lib > � � � implicit double precision (a-h,o-z) > � � � INCLUDE '../include/fftw3.f' > � � � include 'RESOLUTION' > � � � double precision store(lev,1:irie),ffty(lev,0:2*nz), > � � �& � � � � � � � � Yfftrueck(irie,lev) > � � � integer*8 plan > � � � common/plancs/plan(4,maxthr) > .. factorize the input and transmitting to store > � � � call dfftw_execute_r2r(plan(3, idthr),store,store) > .. factorize the output and transmitting to x > � � � end subroutine fftback > > Again similar is subroutine fftforw( y, Fy, idthr). > > And now the following problem happens. If I run the program many times with > only on thread, it always produces the same output. But when I start it > with 2 or more threads it results of the fftw-routines differ not all of > them but many. And because other calculations or done later, the error > increases. If run the program with a rough resolution, the errors are not > significant but with a fine resolution they do big trouble after few steps > I get only Zeros �or NaNs. What I figgered out already before, I was doing > the plannig for fftw before entering the parallel loop, but then I found > out for example the results from 5 threads were fine (no difference) but > for the other 3 threads the differed, thats why I planned for each thread > separately. > And now I have no more idea what I could or should do different. > > So if anybody can help me, I would appreciate it. And sorry for the long > statement, I tried to put in all what could be of interest. > Thanks CorinnaHave you tried using the parameter FFTW_THREADSAFE? I think FFTW is not totally thread-safe. It uses some global data, and I think the plan() uses accumulated 'wisdom'. That may be the source of your problem, though I am not sure from the code. The parameter FFTW_THREADSAFE should sove this. You do know there are threaded versions of the FFTW API? These are the same functions but with _thread appended. But these also are not totally thread-safe, so you still have to use the flag. Chris ================== BORES Signal Processing www.bores.com
Reply by ●November 29, 20102010-11-29
On Nov 27, 10:20�am, HardySpicer <gyansor...@gmail.com> wrote:> On Nov 27, 4:40�am, "neophyte" <schuett@n_o_s_p_a_m.iap-kborn.de> > wrote: > > > > > Hello, > > > after trying to find out why it doesn't work for several weeks, I finally > > think I should try to find a solution with your help. > > Here is my problem: > > We calculate a climate modell, which consists of many array, the size of > > the array depends on the resolution beeing choosed. > > MAIN.f: > > � � � implicit double precision (a-h,o-z) > > � � � external tend > > � � � include 'RESOLUTION' � > > .. more variables � > > � � � common/coplan/iplanback(maxthr),iplanforw(maxthr) > > � � � data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/ > > c > > c++++ destroy plans for fft routines > > � � � call clean_plan > > > .. a lot of reading and pre-calculations for startup > > > � � � time �= 0d0 > > � � � dtime = 0.2d0*step > > � � � do time = 0, endtime > > � � � call tendencies(time,ym2,dydt) > > � � � time = time + dtime > > � � � enddo > > .. > > > subroutine tend(time,y,dydt) > > � � � USE OMP_lib > > � � � implicit double precision (a-h,o-z) > > � � � external intend > > � � � include 'RESOLUTION' > > .. other variables > > > !$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij) > > !$OMP& SHARED(createbackplans,createforwplans) > > � � � nthr=OMP_GET_NUM_THREADS() > > � � � idthr=OMP_GET_THREAD_NUM()+1 > > !$OMP SINGLE > > � � � numthr=nthr > > !$OMP END SINGLE > > !$OMP DO > > � � � do 80 ij = 1,kh > > � � � jlat = 2*(ij-1)+1 > > � � � xij=x(ij) > > � � � call int(jlat,xij,w1,idthr) > > � � � jlat = 2*(ij-1)+2 > > � � � xij=-x(ij) � > > � � � call int(jlat,xij,w2,idthr) > > � � � do iy = 1,ny > > � � � work(iy,idthr) = work(iy,idthr) > > � � �* � � � � � � � + w(ij)*(w1(iy)+w2(iy) ) > > � � � end do > > 80 � �continue > > !$OMP END DO > > !$OMP END PARALLEL > > � � � do ij=1,numthr > > � � � do 60 iy = 1,ny > > � 60 �dydt(iy) = dydt(iy) + work(iy,ij) > > � � � end do > > � � � return > > � � � end subroutine tend > > > subroutine �int(jlat,x,w,idthr) > > � � � USE OMP_lib > > � � � implicit double precision (a-h,o-z) > > � � � include 'RESOLUTION' > > � � � dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or > > ys. > > .. other variables > > � � � common/coplan/iplanback(maxthr),iplanforw(maxthr) > > > .. initialization of Fxs(:,:) > > > � � � if (iplanback(idthr) .eq. 0 ) �then > > !$OMP Critical (createbackplans) > > � � � � �call create_fftback_plan(Fx,idthr) > > !$OMP End Critical (createbackplans) > > � � � � �iplanrueck(idthr) = 1 > > � � � endif > > � � � call fftback( Fx, x, idthr) > > .. many more call fftback(Fx., x., idthr) > > .. then some calculations to xs > > � � � if (iplanforw(idthr) .eq. 0 ) �then > > !$OMP Critical (createforwplans) > > � � � � �call create_fftforw_plan(Fx,idthr) > > !$OMP End Critical (createforwplans) > > � � � � �iplanforw(idthr) = 1 > > � � � endif > > � � � call fftforw( y, Fy, idthr) > > .. many more call fftforw(Fy., y., idthr) > > � � � end subroutine int > > > subroutine create_fftback_plan(Fx,idthr) > > � � � USE OMP_lib > > � � � implicit double precision (a-h,o-z) > > � � � INCLUDE '../include/fftw3.f' > > � � � include 'RESOLUTION' > > � � � double precision store(lev,1:irie),ffty(lev,0:2*nz) > > .. other variables for using fftw > > � � � common/plancs/plan(4,maxthr) > > > .. some factorization to Fx and transmitting it to store > > � � � call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple, > > � � �& � � � store,inembed, istride, idist,store,onembed, > > � � �& � � � ostride, odist, FFTW_HC2R, FFTW_MEASURE) > > � � � end subroutine create_fftback_plan > > > Similar to this is the subroutine create_fftforw_plan(Fx,idthr). > > > subroutine fftback( Fx, x, idthr) > > � � � USE OMP_lib > > � � � implicit double precision (a-h,o-z) > > � � � INCLUDE '../include/fftw3.f' > > � � � include 'RESOLUTION' > > � � � double precision store(lev,1:irie),ffty(lev,0:2*nz), > > � � �& � � � � � � � � Yfftrueck(irie,lev) > > � � � integer*8 plan > > � � � common/plancs/plan(4,maxthr) > > .. factorize the input and transmitting to store > > � � � call dfftw_execute_r2r(plan(3, idthr),store,store) > > .. factorize the output and transmitting to x > > � � � end subroutine fftback > > > Again similar is subroutine fftforw( y, Fy, idthr). > > > And now the following problem happens. If I run the program many times with > > only on thread, it always produces the same output. But when I start it > > with 2 or more threads it results of the fftw-routines differ not all of > > them but many. And because other calculations or done later, the error > > increases. If run the program with a rough resolution, the errors are not > > significant but with a fine resolution they do big trouble after few steps > > I get only Zeros �or NaNs. What I figgered out already before, I was doing > > the plannig for fftw before entering the parallel loop, but then I found > > out for example the results from 5 threads were fine (no difference) but > > for the other 3 threads the differed, thats why I planned for each thread > > separately. > > And now I have no more idea what I could or should do different. > > > So if anybody can help me, I would appreciate it. And sorry for the long > > statement, I tried to put in all what could be of interest. > > Thanks Corinna > > I once used Fortran - about 30 odd years ago! Bloody hell man - get > with it! > Climate model? I assume it must be BS in thaty case. > > HardyFORTRAN is still very commonly used for physics, and for writing C compilers. :-) Chris =========================== Chris Bore BORES SIgnal Processing www.bores.com
Reply by ●December 2, 20102010-12-02
> >Have you tried using the parameter FFTW_THREADSAFE? > >I think FFTW is not totally thread-safe. It uses some global data, and >I think the plan() uses accumulated 'wisdom'. That may be the source >of your problem, though I am not sure from the code. The parameter >FFTW_THREADSAFE should sove this. > >You do know there are threaded versions of the FFTW API? These are the >same functions but with _thread appended. But these also are not >totally thread-safe, so you still have to use the flag. > >Chris >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >BORES Signal Processing >www.bores.com > >Hello Chris, thanks for your response, I tried FFTW_THREADSAFE. But unfortunately it didn't help. First I thought so, because in the the first and second time step no error occured, but then in the third step already the first result differed from the expected one. At the moment I don't know what else I could try? But thanks anyway Corinna