DSPRelated.com
Forums

fftw+openmp in Fortran

Started by neophyte November 26, 2010
Hello,

after trying to find out why it doesn't work for several weeks, I finally
think I should try to find a solution with your help.
Here is my problem:
We calculate a climate modell, which consists of many array, the size of
the array depends on the resolution beeing choosed.
MAIN.f:
      implicit double precision (a-h,o-z)
      external tend
      include 'RESOLUTION'   
.. more variables  
      common/coplan/iplanback(maxthr),iplanforw(maxthr)
      data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/
c
c++++ destroy plans for fft routines
      call clean_plan

.. a lot of reading and pre-calculations for startup

      time  = 0d0
      dtime = 0.2d0*step
      do time = 0, endtime
      call tendencies(time,ym2,dydt)
      time = time + dtime
      enddo
..

subroutine tend(time,y,dydt)
      USE OMP_lib
      implicit double precision (a-h,o-z)
      external intend
      include 'RESOLUTION'
.. other variables

!$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij)
!$OMP& SHARED(createbackplans,createforwplans)
      nthr=OMP_GET_NUM_THREADS()
      idthr=OMP_GET_THREAD_NUM()+1
!$OMP SINGLE
      numthr=nthr 
!$OMP END SINGLE
!$OMP DO
      do 80 ij = 1,kh
      jlat = 2*(ij-1)+1
      xij=x(ij)
      call int(jlat,xij,w1,idthr)
      jlat = 2*(ij-1)+2
      xij=-x(ij)   
      call int(jlat,xij,w2,idthr)
      do iy = 1,ny
      work(iy,idthr) = work(iy,idthr)
     *               + w(ij)*(w1(iy)+w2(iy) )
      end do
80    continue
!$OMP END DO
!$OMP END PARALLEL
      do ij=1,numthr
      do 60 iy = 1,ny
  60  dydt(iy) = dydt(iy) + work(iy,ij)
      end do
      return
      end subroutine tend

subroutine  int(jlat,x,w,idthr)
      USE OMP_lib
      implicit double precision (a-h,o-z)
      include 'RESOLUTION'
      dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or
ys.
.. other variables
      common/coplan/iplanback(maxthr),iplanforw(maxthr)

.. initialization of Fxs(:,:)
    
      if (iplanback(idthr) .eq. 0 )  then 
!$OMP Critical (createbackplans)
         call create_fftback_plan(Fx,idthr)
!$OMP End Critical (createbackplans)
         iplanrueck(idthr) = 1
      endif
      call fftback( Fx, x, idthr)
.. many more call fftback(Fx., x., idthr)
.. then some calculations to xs
      if (iplanforw(idthr) .eq. 0 )  then 
!$OMP Critical (createforwplans)
         call create_fftforw_plan(Fx,idthr)
!$OMP End Critical (createforwplans)
         iplanforw(idthr) = 1
      endif
      call fftforw( y, Fy, idthr)
.. many more call fftforw(Fy., y., idthr)
      end subroutine int

subroutine create_fftback_plan(Fx,idthr)
      USE OMP_lib
      implicit double precision (a-h,o-z)
      INCLUDE '../include/fftw3.f'
      include 'RESOLUTION'
      double precision store(lev,1:irie),ffty(lev,0:2*nz)
.. other variables for using fftw
      common/plancs/plan(4,maxthr)

.. some factorization to Fx and transmitting it to store
      call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple,
     &       store,inembed, istride, idist,store,onembed,
     &       ostride, odist, FFTW_HC2R, FFTW_MEASURE)
      end subroutine create_fftback_plan

Similar to this is the subroutine create_fftforw_plan(Fx,idthr).

subroutine fftback( Fx, x, idthr)
      USE OMP_lib
      implicit double precision (a-h,o-z)
      INCLUDE '../include/fftw3.f'
      include 'RESOLUTION'
      double precision store(lev,1:irie),ffty(lev,0:2*nz),
     &                 Yfftrueck(irie,lev)
      integer*8 plan
      common/plancs/plan(4,maxthr)
.. factorize the input and transmitting to store
      call dfftw_execute_r2r(plan(3, idthr),store,store)
.. factorize the output and transmitting to x
      end subroutine fftback

Again similar is subroutine fftforw( y, Fy, idthr).

And now the following problem happens. If I run the program many times with
only on thread, it always produces the same output. But when I start it
with 2 or more threads it results of the fftw-routines differ not all of
them but many. And because other calculations or done later, the error
increases. If run the program with a rough resolution, the errors are not
significant but with a fine resolution they do big trouble after few steps
I get only Zeros  or NaNs. What I figgered out already before, I was doing
the plannig for fftw before entering the parallel loop, but then I found
out for example the results from 5 threads were fine (no difference) but
for the other 3 threads the differed, thats why I planned for each thread
separately.
And now I have no more idea what I could or should do different.

So if anybody can help me, I would appreciate it. And sorry for the long
statement, I tried to put in all what could be of interest.
Thanks Corinna



     

    






On Nov 27, 4:40&#4294967295;am, "neophyte" <schuett@n_o_s_p_a_m.iap-kborn.de>
wrote:
> Hello, > > after trying to find out why it doesn't work for several weeks, I finally > think I should try to find a solution with your help. > Here is my problem: > We calculate a climate modell, which consists of many array, the size of > the array depends on the resolution beeing choosed. > MAIN.f: > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > &#4294967295; &#4294967295; &#4294967295; external tend > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' &#4294967295; > .. more variables &#4294967295; > &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr) > &#4294967295; &#4294967295; &#4294967295; data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/ > c > c++++ destroy plans for fft routines > &#4294967295; &#4294967295; &#4294967295; call clean_plan > > .. a lot of reading and pre-calculations for startup > > &#4294967295; &#4294967295; &#4294967295; time &#4294967295;= 0d0 > &#4294967295; &#4294967295; &#4294967295; dtime = 0.2d0*step > &#4294967295; &#4294967295; &#4294967295; do time = 0, endtime > &#4294967295; &#4294967295; &#4294967295; call tendencies(time,ym2,dydt) > &#4294967295; &#4294967295; &#4294967295; time = time + dtime > &#4294967295; &#4294967295; &#4294967295; enddo > .. > > subroutine tend(time,y,dydt) > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > &#4294967295; &#4294967295; &#4294967295; external intend > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > .. other variables > > !$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij) > !$OMP& SHARED(createbackplans,createforwplans) > &#4294967295; &#4294967295; &#4294967295; nthr=OMP_GET_NUM_THREADS() > &#4294967295; &#4294967295; &#4294967295; idthr=OMP_GET_THREAD_NUM()+1 > !$OMP SINGLE > &#4294967295; &#4294967295; &#4294967295; numthr=nthr > !$OMP END SINGLE > !$OMP DO > &#4294967295; &#4294967295; &#4294967295; do 80 ij = 1,kh > &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+1 > &#4294967295; &#4294967295; &#4294967295; xij=x(ij) > &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w1,idthr) > &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+2 > &#4294967295; &#4294967295; &#4294967295; xij=-x(ij) &#4294967295; > &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w2,idthr) > &#4294967295; &#4294967295; &#4294967295; do iy = 1,ny > &#4294967295; &#4294967295; &#4294967295; work(iy,idthr) = work(iy,idthr) > &#4294967295; &#4294967295; &#4294967295;* &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; + w(ij)*(w1(iy)+w2(iy) ) > &#4294967295; &#4294967295; &#4294967295; end do > 80 &#4294967295; &#4294967295;continue > !$OMP END DO > !$OMP END PARALLEL > &#4294967295; &#4294967295; &#4294967295; do ij=1,numthr > &#4294967295; &#4294967295; &#4294967295; do 60 iy = 1,ny > &#4294967295; 60 &#4294967295;dydt(iy) = dydt(iy) + work(iy,ij) > &#4294967295; &#4294967295; &#4294967295; end do > &#4294967295; &#4294967295; &#4294967295; return > &#4294967295; &#4294967295; &#4294967295; end subroutine tend > > subroutine &#4294967295;int(jlat,x,w,idthr) > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > &#4294967295; &#4294967295; &#4294967295; dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or > ys. > .. other variables > &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr) > > .. initialization of Fxs(:,:) > > &#4294967295; &#4294967295; &#4294967295; if (iplanback(idthr) .eq. 0 ) &#4294967295;then > !$OMP Critical (createbackplans) > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftback_plan(Fx,idthr) > !$OMP End Critical (createbackplans) > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanrueck(idthr) = 1 > &#4294967295; &#4294967295; &#4294967295; endif > &#4294967295; &#4294967295; &#4294967295; call fftback( Fx, x, idthr) > .. many more call fftback(Fx., x., idthr) > .. then some calculations to xs > &#4294967295; &#4294967295; &#4294967295; if (iplanforw(idthr) .eq. 0 ) &#4294967295;then > !$OMP Critical (createforwplans) > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftforw_plan(Fx,idthr) > !$OMP End Critical (createforwplans) > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanforw(idthr) = 1 > &#4294967295; &#4294967295; &#4294967295; endif > &#4294967295; &#4294967295; &#4294967295; call fftforw( y, Fy, idthr) > .. many more call fftforw(Fy., y., idthr) > &#4294967295; &#4294967295; &#4294967295; end subroutine int > > subroutine create_fftback_plan(Fx,idthr) > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f' > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz) > .. other variables for using fftw > &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr) > > .. some factorization to Fx and transmitting it to store > &#4294967295; &#4294967295; &#4294967295; call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple, > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; store,inembed, istride, idist,store,onembed, > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; ostride, odist, FFTW_HC2R, FFTW_MEASURE) > &#4294967295; &#4294967295; &#4294967295; end subroutine create_fftback_plan > > Similar to this is the subroutine create_fftforw_plan(Fx,idthr). > > subroutine fftback( Fx, x, idthr) > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f' > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz), > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; Yfftrueck(irie,lev) > &#4294967295; &#4294967295; &#4294967295; integer*8 plan > &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr) > .. factorize the input and transmitting to store > &#4294967295; &#4294967295; &#4294967295; call dfftw_execute_r2r(plan(3, idthr),store,store) > .. factorize the output and transmitting to x > &#4294967295; &#4294967295; &#4294967295; end subroutine fftback > > Again similar is subroutine fftforw( y, Fy, idthr). > > And now the following problem happens. If I run the program many times with > only on thread, it always produces the same output. But when I start it > with 2 or more threads it results of the fftw-routines differ not all of > them but many. And because other calculations or done later, the error > increases. If run the program with a rough resolution, the errors are not > significant but with a fine resolution they do big trouble after few steps > I get only Zeros &#4294967295;or NaNs. What I figgered out already before, I was doing > the plannig for fftw before entering the parallel loop, but then I found > out for example the results from 5 threads were fine (no difference) but > for the other 3 threads the differed, thats why I planned for each thread > separately. > And now I have no more idea what I could or should do different. > > So if anybody can help me, I would appreciate it. And sorry for the long > statement, I tried to put in all what could be of interest. > Thanks Corinna
I once used Fortran - about 30 odd years ago! Bloody hell man - get with it! Climate model? I assume it must be BS in thaty case. Hardy
On Nov 26, 3:40&#4294967295;pm, "neophyte" <schuett@n_o_s_p_a_m.iap-kborn.de>
wrote:
> Hello, > > after trying to find out why it doesn't work for several weeks, I finally > think I should try to find a solution with your help. > Here is my problem: > We calculate a climate modell, which consists of many array, the size of > the array depends on the resolution beeing choosed. > MAIN.f: > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > &#4294967295; &#4294967295; &#4294967295; external tend > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' &#4294967295; > .. more variables &#4294967295; > &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr) > &#4294967295; &#4294967295; &#4294967295; data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/ > c > c++++ destroy plans for fft routines > &#4294967295; &#4294967295; &#4294967295; call clean_plan > > .. a lot of reading and pre-calculations for startup > > &#4294967295; &#4294967295; &#4294967295; time &#4294967295;= 0d0 > &#4294967295; &#4294967295; &#4294967295; dtime = 0.2d0*step > &#4294967295; &#4294967295; &#4294967295; do time = 0, endtime > &#4294967295; &#4294967295; &#4294967295; call tendencies(time,ym2,dydt) > &#4294967295; &#4294967295; &#4294967295; time = time + dtime > &#4294967295; &#4294967295; &#4294967295; enddo > .. > > subroutine tend(time,y,dydt) > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > &#4294967295; &#4294967295; &#4294967295; external intend > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > .. other variables > > !$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij) > !$OMP& SHARED(createbackplans,createforwplans) > &#4294967295; &#4294967295; &#4294967295; nthr=OMP_GET_NUM_THREADS() > &#4294967295; &#4294967295; &#4294967295; idthr=OMP_GET_THREAD_NUM()+1 > !$OMP SINGLE > &#4294967295; &#4294967295; &#4294967295; numthr=nthr > !$OMP END SINGLE > !$OMP DO > &#4294967295; &#4294967295; &#4294967295; do 80 ij = 1,kh > &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+1 > &#4294967295; &#4294967295; &#4294967295; xij=x(ij) > &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w1,idthr) > &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+2 > &#4294967295; &#4294967295; &#4294967295; xij=-x(ij) &#4294967295; > &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w2,idthr) > &#4294967295; &#4294967295; &#4294967295; do iy = 1,ny > &#4294967295; &#4294967295; &#4294967295; work(iy,idthr) = work(iy,idthr) > &#4294967295; &#4294967295; &#4294967295;* &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; + w(ij)*(w1(iy)+w2(iy) ) > &#4294967295; &#4294967295; &#4294967295; end do > 80 &#4294967295; &#4294967295;continue > !$OMP END DO > !$OMP END PARALLEL > &#4294967295; &#4294967295; &#4294967295; do ij=1,numthr > &#4294967295; &#4294967295; &#4294967295; do 60 iy = 1,ny > &#4294967295; 60 &#4294967295;dydt(iy) = dydt(iy) + work(iy,ij) > &#4294967295; &#4294967295; &#4294967295; end do > &#4294967295; &#4294967295; &#4294967295; return > &#4294967295; &#4294967295; &#4294967295; end subroutine tend > > subroutine &#4294967295;int(jlat,x,w,idthr) > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > &#4294967295; &#4294967295; &#4294967295; dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or > ys. > .. other variables > &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr) > > .. initialization of Fxs(:,:) > > &#4294967295; &#4294967295; &#4294967295; if (iplanback(idthr) .eq. 0 ) &#4294967295;then > !$OMP Critical (createbackplans) > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftback_plan(Fx,idthr) > !$OMP End Critical (createbackplans) > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanrueck(idthr) = 1 > &#4294967295; &#4294967295; &#4294967295; endif > &#4294967295; &#4294967295; &#4294967295; call fftback( Fx, x, idthr) > .. many more call fftback(Fx., x., idthr) > .. then some calculations to xs > &#4294967295; &#4294967295; &#4294967295; if (iplanforw(idthr) .eq. 0 ) &#4294967295;then > !$OMP Critical (createforwplans) > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftforw_plan(Fx,idthr) > !$OMP End Critical (createforwplans) > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanforw(idthr) = 1 > &#4294967295; &#4294967295; &#4294967295; endif > &#4294967295; &#4294967295; &#4294967295; call fftforw( y, Fy, idthr) > .. many more call fftforw(Fy., y., idthr) > &#4294967295; &#4294967295; &#4294967295; end subroutine int > > subroutine create_fftback_plan(Fx,idthr) > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f' > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz) > .. other variables for using fftw > &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr) > > .. some factorization to Fx and transmitting it to store > &#4294967295; &#4294967295; &#4294967295; call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple, > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; store,inembed, istride, idist,store,onembed, > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; ostride, odist, FFTW_HC2R, FFTW_MEASURE) > &#4294967295; &#4294967295; &#4294967295; end subroutine create_fftback_plan > > Similar to this is the subroutine create_fftforw_plan(Fx,idthr). > > subroutine fftback( Fx, x, idthr) > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f' > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz), > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; Yfftrueck(irie,lev) > &#4294967295; &#4294967295; &#4294967295; integer*8 plan > &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr) > .. factorize the input and transmitting to store > &#4294967295; &#4294967295; &#4294967295; call dfftw_execute_r2r(plan(3, idthr),store,store) > .. factorize the output and transmitting to x > &#4294967295; &#4294967295; &#4294967295; end subroutine fftback > > Again similar is subroutine fftforw( y, Fy, idthr). > > And now the following problem happens. If I run the program many times with > only on thread, it always produces the same output. But when I start it > with 2 or more threads it results of the fftw-routines differ not all of > them but many. And because other calculations or done later, the error > increases. If run the program with a rough resolution, the errors are not > significant but with a fine resolution they do big trouble after few steps > I get only Zeros &#4294967295;or NaNs. What I figgered out already before, I was doing > the plannig for fftw before entering the parallel loop, but then I found > out for example the results from 5 threads were fine (no difference) but > for the other 3 threads the differed, thats why I planned for each thread > separately. > And now I have no more idea what I could or should do different. > > So if anybody can help me, I would appreciate it. And sorry for the long > statement, I tried to put in all what could be of interest. > Thanks Corinna
Have you tried using the parameter FFTW_THREADSAFE? I think FFTW is not totally thread-safe. It uses some global data, and I think the plan() uses accumulated 'wisdom'. That may be the source of your problem, though I am not sure from the code. The parameter FFTW_THREADSAFE should sove this. You do know there are threaded versions of the FFTW API? These are the same functions but with _thread appended. But these also are not totally thread-safe, so you still have to use the flag. Chris ================== BORES Signal Processing www.bores.com
On Nov 27, 10:20&#4294967295;am, HardySpicer <gyansor...@gmail.com> wrote:
> On Nov 27, 4:40&#4294967295;am, "neophyte" <schuett@n_o_s_p_a_m.iap-kborn.de> > wrote: > > > > > Hello, > > > after trying to find out why it doesn't work for several weeks, I finally > > think I should try to find a solution with your help. > > Here is my problem: > > We calculate a climate modell, which consists of many array, the size of > > the array depends on the resolution beeing choosed. > > MAIN.f: > > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > > &#4294967295; &#4294967295; &#4294967295; external tend > > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' &#4294967295; > > .. more variables &#4294967295; > > &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr) > > &#4294967295; &#4294967295; &#4294967295; data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/ > > c > > c++++ destroy plans for fft routines > > &#4294967295; &#4294967295; &#4294967295; call clean_plan > > > .. a lot of reading and pre-calculations for startup > > > &#4294967295; &#4294967295; &#4294967295; time &#4294967295;= 0d0 > > &#4294967295; &#4294967295; &#4294967295; dtime = 0.2d0*step > > &#4294967295; &#4294967295; &#4294967295; do time = 0, endtime > > &#4294967295; &#4294967295; &#4294967295; call tendencies(time,ym2,dydt) > > &#4294967295; &#4294967295; &#4294967295; time = time + dtime > > &#4294967295; &#4294967295; &#4294967295; enddo > > .. > > > subroutine tend(time,y,dydt) > > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > > &#4294967295; &#4294967295; &#4294967295; external intend > > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > > .. other variables > > > !$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij) > > !$OMP& SHARED(createbackplans,createforwplans) > > &#4294967295; &#4294967295; &#4294967295; nthr=OMP_GET_NUM_THREADS() > > &#4294967295; &#4294967295; &#4294967295; idthr=OMP_GET_THREAD_NUM()+1 > > !$OMP SINGLE > > &#4294967295; &#4294967295; &#4294967295; numthr=nthr > > !$OMP END SINGLE > > !$OMP DO > > &#4294967295; &#4294967295; &#4294967295; do 80 ij = 1,kh > > &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+1 > > &#4294967295; &#4294967295; &#4294967295; xij=x(ij) > > &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w1,idthr) > > &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+2 > > &#4294967295; &#4294967295; &#4294967295; xij=-x(ij) &#4294967295; > > &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w2,idthr) > > &#4294967295; &#4294967295; &#4294967295; do iy = 1,ny > > &#4294967295; &#4294967295; &#4294967295; work(iy,idthr) = work(iy,idthr) > > &#4294967295; &#4294967295; &#4294967295;* &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; + w(ij)*(w1(iy)+w2(iy) ) > > &#4294967295; &#4294967295; &#4294967295; end do > > 80 &#4294967295; &#4294967295;continue > > !$OMP END DO > > !$OMP END PARALLEL > > &#4294967295; &#4294967295; &#4294967295; do ij=1,numthr > > &#4294967295; &#4294967295; &#4294967295; do 60 iy = 1,ny > > &#4294967295; 60 &#4294967295;dydt(iy) = dydt(iy) + work(iy,ij) > > &#4294967295; &#4294967295; &#4294967295; end do > > &#4294967295; &#4294967295; &#4294967295; return > > &#4294967295; &#4294967295; &#4294967295; end subroutine tend > > > subroutine &#4294967295;int(jlat,x,w,idthr) > > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > > &#4294967295; &#4294967295; &#4294967295; dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or > > ys. > > .. other variables > > &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr) > > > .. initialization of Fxs(:,:) > > > &#4294967295; &#4294967295; &#4294967295; if (iplanback(idthr) .eq. 0 ) &#4294967295;then > > !$OMP Critical (createbackplans) > > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftback_plan(Fx,idthr) > > !$OMP End Critical (createbackplans) > > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanrueck(idthr) = 1 > > &#4294967295; &#4294967295; &#4294967295; endif > > &#4294967295; &#4294967295; &#4294967295; call fftback( Fx, x, idthr) > > .. many more call fftback(Fx., x., idthr) > > .. then some calculations to xs > > &#4294967295; &#4294967295; &#4294967295; if (iplanforw(idthr) .eq. 0 ) &#4294967295;then > > !$OMP Critical (createforwplans) > > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftforw_plan(Fx,idthr) > > !$OMP End Critical (createforwplans) > > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanforw(idthr) = 1 > > &#4294967295; &#4294967295; &#4294967295; endif > > &#4294967295; &#4294967295; &#4294967295; call fftforw( y, Fy, idthr) > > .. many more call fftforw(Fy., y., idthr) > > &#4294967295; &#4294967295; &#4294967295; end subroutine int > > > subroutine create_fftback_plan(Fx,idthr) > > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > > &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f' > > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > > &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz) > > .. other variables for using fftw > > &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr) > > > .. some factorization to Fx and transmitting it to store > > &#4294967295; &#4294967295; &#4294967295; call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple, > > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; store,inembed, istride, idist,store,onembed, > > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; ostride, odist, FFTW_HC2R, FFTW_MEASURE) > > &#4294967295; &#4294967295; &#4294967295; end subroutine create_fftback_plan > > > Similar to this is the subroutine create_fftforw_plan(Fx,idthr). > > > subroutine fftback( Fx, x, idthr) > > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib > > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z) > > &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f' > > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' > > &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz), > > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; Yfftrueck(irie,lev) > > &#4294967295; &#4294967295; &#4294967295; integer*8 plan > > &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr) > > .. factorize the input and transmitting to store > > &#4294967295; &#4294967295; &#4294967295; call dfftw_execute_r2r(plan(3, idthr),store,store) > > .. factorize the output and transmitting to x > > &#4294967295; &#4294967295; &#4294967295; end subroutine fftback > > > Again similar is subroutine fftforw( y, Fy, idthr). > > > And now the following problem happens. If I run the program many times with > > only on thread, it always produces the same output. But when I start it > > with 2 or more threads it results of the fftw-routines differ not all of > > them but many. And because other calculations or done later, the error > > increases. If run the program with a rough resolution, the errors are not > > significant but with a fine resolution they do big trouble after few steps > > I get only Zeros &#4294967295;or NaNs. What I figgered out already before, I was doing > > the plannig for fftw before entering the parallel loop, but then I found > > out for example the results from 5 threads were fine (no difference) but > > for the other 3 threads the differed, thats why I planned for each thread > > separately. > > And now I have no more idea what I could or should do different. > > > So if anybody can help me, I would appreciate it. And sorry for the long > > statement, I tried to put in all what could be of interest. > > Thanks Corinna > > I once used Fortran - about 30 odd years ago! Bloody hell man - get > with it! > Climate model? I assume it must be BS in thaty case. > > Hardy
FORTRAN is still very commonly used for physics, and for writing C compilers. :-) Chris =========================== Chris Bore BORES SIgnal Processing www.bores.com
> >Have you tried using the parameter FFTW_THREADSAFE? > >I think FFTW is not totally thread-safe. It uses some global data, and >I think the plan() uses accumulated 'wisdom'. That may be the source >of your problem, though I am not sure from the code. The parameter >FFTW_THREADSAFE should sove this. > >You do know there are threaded versions of the FFTW API? These are the >same functions but with _thread appended. But these also are not >totally thread-safe, so you still have to use the flag. > >Chris >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >BORES Signal Processing >www.bores.com > >
Hello Chris, thanks for your response, I tried FFTW_THREADSAFE. But unfortunately it didn't help. First I thought so, because in the the first and second time step no error occured, but then in the third step already the first result differed from the expected one. At the moment I don't know what else I could try? But thanks anyway Corinna