comp.dsp | fftw+openmp in Fortran

Hello,

after trying to find out why it doesn't work for several weeks, I finally
think I should try to find a solution with your help.
Here is my problem:
We calculate a climate modell, which consists of many array, the size of
the array depends on the resolution beeing choosed.
MAIN.f:
      implicit double precision (a-h,o-z)
      external tend
      include 'RESOLUTION'   
.. more variables  
      common/coplan/iplanback(maxthr),iplanforw(maxthr)
      data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/
c
c++++ destroy plans for fft routines
      call clean_plan

.. a lot of reading and pre-calculations for startup

      time  = 0d0
      dtime = 0.2d0*step
      do time = 0, endtime
      call tendencies(time,ym2,dydt)
      time = time + dtime
      enddo
..

subroutine tend(time,y,dydt)
      USE OMP_lib
      implicit double precision (a-h,o-z)
      external intend
      include 'RESOLUTION'
.. other variables

!$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij)
!$OMP& SHARED(createbackplans,createforwplans)
      nthr=OMP_GET_NUM_THREADS()
      idthr=OMP_GET_THREAD_NUM()+1
!$OMP SINGLE
      numthr=nthr 
!$OMP END SINGLE
!$OMP DO
      do 80 ij = 1,kh
      jlat = 2*(ij-1)+1
      xij=x(ij)
      call int(jlat,xij,w1,idthr)
      jlat = 2*(ij-1)+2
      xij=-x(ij)   
      call int(jlat,xij,w2,idthr)
      do iy = 1,ny
      work(iy,idthr) = work(iy,idthr)
     *               + w(ij)*(w1(iy)+w2(iy) )
      end do
80    continue
!$OMP END DO
!$OMP END PARALLEL
      do ij=1,numthr
      do 60 iy = 1,ny
  60  dydt(iy) = dydt(iy) + work(iy,ij)
      end do
      return
      end subroutine tend

subroutine  int(jlat,x,w,idthr)
      USE OMP_lib
      implicit double precision (a-h,o-z)
      include 'RESOLUTION'
      dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or
ys.
.. other variables
      common/coplan/iplanback(maxthr),iplanforw(maxthr)

.. initialization of Fxs(:,:)
    
      if (iplanback(idthr) .eq. 0 )  then 
!$OMP Critical (createbackplans)
         call create_fftback_plan(Fx,idthr)
!$OMP End Critical (createbackplans)
         iplanrueck(idthr) = 1
      endif
      call fftback( Fx, x, idthr)
.. many more call fftback(Fx., x., idthr)
.. then some calculations to xs
      if (iplanforw(idthr) .eq. 0 )  then 
!$OMP Critical (createforwplans)
         call create_fftforw_plan(Fx,idthr)
!$OMP End Critical (createforwplans)
         iplanforw(idthr) = 1
      endif
      call fftforw( y, Fy, idthr)
.. many more call fftforw(Fy., y., idthr)
      end subroutine int

subroutine create_fftback_plan(Fx,idthr)
      USE OMP_lib
      implicit double precision (a-h,o-z)
      INCLUDE '../include/fftw3.f'
      include 'RESOLUTION'
      double precision store(lev,1:irie),ffty(lev,0:2*nz)
.. other variables for using fftw
      common/plancs/plan(4,maxthr)

.. some factorization to Fx and transmitting it to store
      call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple,
     &       store,inembed, istride, idist,store,onembed,
     &       ostride, odist, FFTW_HC2R, FFTW_MEASURE)
      end subroutine create_fftback_plan

Similar to this is the subroutine create_fftforw_plan(Fx,idthr).

subroutine fftback( Fx, x, idthr)
      USE OMP_lib
      implicit double precision (a-h,o-z)
      INCLUDE '../include/fftw3.f'
      include 'RESOLUTION'
      double precision store(lev,1:irie),ffty(lev,0:2*nz),
     &                 Yfftrueck(irie,lev)
      integer*8 plan
      common/plancs/plan(4,maxthr)
.. factorize the input and transmitting to store
      call dfftw_execute_r2r(plan(3, idthr),store,store)
.. factorize the output and transmitting to x
      end subroutine fftback

Again similar is subroutine fftforw( y, Fy, idthr).

And now the following problem happens. If I run the program many times with
only on thread, it always produces the same output. But when I start it
with 2 or more threads it results of the fftw-routines differ not all of
them but many. And because other calculations or done later, the error
increases. If run the program with a rough resolution, the errors are not
significant but with a fine resolution they do big trouble after few steps
I get only Zeros  or NaNs. What I figgered out already before, I was doing
the plannig for fftw before entering the parallel loop, but then I found
out for example the results from 5 threads were fine (no difference) but
for the other 3 threads the differed, thats why I planned for each thread
separately.
And now I have no more idea what I could or should do different.

So if anybody can help me, I would appreciate it. And sorry for the long
statement, I tried to put in all what could be of interest.
Thanks Corinna

Reply by HardySpicer ●November 27, 20102010-11-27

On Nov 27, 4:40&#4294967295;am, "neophyte" <schuett@n_o_s_p_a_m.iap-kborn.de>
wrote:
> Hello,
>
> after trying to find out why it doesn't work for several weeks, I finally
> think I should try to find a solution with your help.
> Here is my problem:
> We calculate a climate modell, which consists of many array, the size of
> the array depends on the resolution beeing choosed.
> MAIN.f:
> &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> &#4294967295; &#4294967295; &#4294967295; external tend
> &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' &#4294967295;
> .. more variables &#4294967295;
> &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr)
> &#4294967295; &#4294967295; &#4294967295; data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/
> c
> c++++ destroy plans for fft routines
> &#4294967295; &#4294967295; &#4294967295; call clean_plan
>
> .. a lot of reading and pre-calculations for startup
>
> &#4294967295; &#4294967295; &#4294967295; time &#4294967295;= 0d0
> &#4294967295; &#4294967295; &#4294967295; dtime = 0.2d0*step
> &#4294967295; &#4294967295; &#4294967295; do time = 0, endtime
> &#4294967295; &#4294967295; &#4294967295; call tendencies(time,ym2,dydt)
> &#4294967295; &#4294967295; &#4294967295; time = time + dtime
> &#4294967295; &#4294967295; &#4294967295; enddo
> ..
>
> subroutine tend(time,y,dydt)
> &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> &#4294967295; &#4294967295; &#4294967295; external intend
> &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> .. other variables
>
> !$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij)
> !$OMP& SHARED(createbackplans,createforwplans)
> &#4294967295; &#4294967295; &#4294967295; nthr=OMP_GET_NUM_THREADS()
> &#4294967295; &#4294967295; &#4294967295; idthr=OMP_GET_THREAD_NUM()+1
> !$OMP SINGLE
> &#4294967295; &#4294967295; &#4294967295; numthr=nthr
> !$OMP END SINGLE
> !$OMP DO
> &#4294967295; &#4294967295; &#4294967295; do 80 ij = 1,kh
> &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+1
> &#4294967295; &#4294967295; &#4294967295; xij=x(ij)
> &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w1,idthr)
> &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+2
> &#4294967295; &#4294967295; &#4294967295; xij=-x(ij) &#4294967295;
> &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w2,idthr)
> &#4294967295; &#4294967295; &#4294967295; do iy = 1,ny
> &#4294967295; &#4294967295; &#4294967295; work(iy,idthr) = work(iy,idthr)
> &#4294967295; &#4294967295; &#4294967295;* &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; + w(ij)*(w1(iy)+w2(iy) )
> &#4294967295; &#4294967295; &#4294967295; end do
> 80 &#4294967295; &#4294967295;continue
> !$OMP END DO
> !$OMP END PARALLEL
> &#4294967295; &#4294967295; &#4294967295; do ij=1,numthr
> &#4294967295; &#4294967295; &#4294967295; do 60 iy = 1,ny
> &#4294967295; 60 &#4294967295;dydt(iy) = dydt(iy) + work(iy,ij)
> &#4294967295; &#4294967295; &#4294967295; end do
> &#4294967295; &#4294967295; &#4294967295; return
> &#4294967295; &#4294967295; &#4294967295; end subroutine tend
>
> subroutine &#4294967295;int(jlat,x,w,idthr)
> &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> &#4294967295; &#4294967295; &#4294967295; dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or
> ys.
> .. other variables
> &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr)
>
> .. initialization of Fxs(:,:)
>
> &#4294967295; &#4294967295; &#4294967295; if (iplanback(idthr) .eq. 0 ) &#4294967295;then
> !$OMP Critical (createbackplans)
> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftback_plan(Fx,idthr)
> !$OMP End Critical (createbackplans)
> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanrueck(idthr) = 1
> &#4294967295; &#4294967295; &#4294967295; endif
> &#4294967295; &#4294967295; &#4294967295; call fftback( Fx, x, idthr)
> .. many more call fftback(Fx., x., idthr)
> .. then some calculations to xs
> &#4294967295; &#4294967295; &#4294967295; if (iplanforw(idthr) .eq. 0 ) &#4294967295;then
> !$OMP Critical (createforwplans)
> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftforw_plan(Fx,idthr)
> !$OMP End Critical (createforwplans)
> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanforw(idthr) = 1
> &#4294967295; &#4294967295; &#4294967295; endif
> &#4294967295; &#4294967295; &#4294967295; call fftforw( y, Fy, idthr)
> .. many more call fftforw(Fy., y., idthr)
> &#4294967295; &#4294967295; &#4294967295; end subroutine int
>
> subroutine create_fftback_plan(Fx,idthr)
> &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f'
> &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz)
> .. other variables for using fftw
> &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr)
>
> .. some factorization to Fx and transmitting it to store
> &#4294967295; &#4294967295; &#4294967295; call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple,
> &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; store,inembed, istride, idist,store,onembed,
> &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; ostride, odist, FFTW_HC2R, FFTW_MEASURE)
> &#4294967295; &#4294967295; &#4294967295; end subroutine create_fftback_plan
>
> Similar to this is the subroutine create_fftforw_plan(Fx,idthr).
>
> subroutine fftback( Fx, x, idthr)
> &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f'
> &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz),
> &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; Yfftrueck(irie,lev)
> &#4294967295; &#4294967295; &#4294967295; integer*8 plan
> &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr)
> .. factorize the input and transmitting to store
> &#4294967295; &#4294967295; &#4294967295; call dfftw_execute_r2r(plan(3, idthr),store,store)
> .. factorize the output and transmitting to x
> &#4294967295; &#4294967295; &#4294967295; end subroutine fftback
>
> Again similar is subroutine fftforw( y, Fy, idthr).
>
> And now the following problem happens. If I run the program many times with
> only on thread, it always produces the same output. But when I start it
> with 2 or more threads it results of the fftw-routines differ not all of
> them but many. And because other calculations or done later, the error
> increases. If run the program with a rough resolution, the errors are not
> significant but with a fine resolution they do big trouble after few steps
> I get only Zeros &#4294967295;or NaNs. What I figgered out already before, I was doing
> the plannig for fftw before entering the parallel loop, but then I found
> out for example the results from 5 threads were fine (no difference) but
> for the other 3 threads the differed, thats why I planned for each thread
> separately.
> And now I have no more idea what I could or should do different.
>
> So if anybody can help me, I would appreciate it. And sorry for the long
> statement, I tried to put in all what could be of interest.
> Thanks Corinna

I once used Fortran - about 30 odd years ago! Bloody hell man - get
with it!
Climate model? I assume it must be BS in thaty case.


Hardy

Reply by Chris Bore ●November 29, 20102010-11-29

On Nov 26, 3:40&#4294967295;pm, "neophyte" <schuett@n_o_s_p_a_m.iap-kborn.de>
wrote:
> Hello,
>
> after trying to find out why it doesn't work for several weeks, I finally
> think I should try to find a solution with your help.
> Here is my problem:
> We calculate a climate modell, which consists of many array, the size of
> the array depends on the resolution beeing choosed.
> MAIN.f:
> &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> &#4294967295; &#4294967295; &#4294967295; external tend
> &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' &#4294967295;
> .. more variables &#4294967295;
> &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr)
> &#4294967295; &#4294967295; &#4294967295; data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/
> c
> c++++ destroy plans for fft routines
> &#4294967295; &#4294967295; &#4294967295; call clean_plan
>
> .. a lot of reading and pre-calculations for startup
>
> &#4294967295; &#4294967295; &#4294967295; time &#4294967295;= 0d0
> &#4294967295; &#4294967295; &#4294967295; dtime = 0.2d0*step
> &#4294967295; &#4294967295; &#4294967295; do time = 0, endtime
> &#4294967295; &#4294967295; &#4294967295; call tendencies(time,ym2,dydt)
> &#4294967295; &#4294967295; &#4294967295; time = time + dtime
> &#4294967295; &#4294967295; &#4294967295; enddo
> ..
>
> subroutine tend(time,y,dydt)
> &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> &#4294967295; &#4294967295; &#4294967295; external intend
> &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> .. other variables
>
> !$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij)
> !$OMP& SHARED(createbackplans,createforwplans)
> &#4294967295; &#4294967295; &#4294967295; nthr=OMP_GET_NUM_THREADS()
> &#4294967295; &#4294967295; &#4294967295; idthr=OMP_GET_THREAD_NUM()+1
> !$OMP SINGLE
> &#4294967295; &#4294967295; &#4294967295; numthr=nthr
> !$OMP END SINGLE
> !$OMP DO
> &#4294967295; &#4294967295; &#4294967295; do 80 ij = 1,kh
> &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+1
> &#4294967295; &#4294967295; &#4294967295; xij=x(ij)
> &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w1,idthr)
> &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+2
> &#4294967295; &#4294967295; &#4294967295; xij=-x(ij) &#4294967295;
> &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w2,idthr)
> &#4294967295; &#4294967295; &#4294967295; do iy = 1,ny
> &#4294967295; &#4294967295; &#4294967295; work(iy,idthr) = work(iy,idthr)
> &#4294967295; &#4294967295; &#4294967295;* &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; + w(ij)*(w1(iy)+w2(iy) )
> &#4294967295; &#4294967295; &#4294967295; end do
> 80 &#4294967295; &#4294967295;continue
> !$OMP END DO
> !$OMP END PARALLEL
> &#4294967295; &#4294967295; &#4294967295; do ij=1,numthr
> &#4294967295; &#4294967295; &#4294967295; do 60 iy = 1,ny
> &#4294967295; 60 &#4294967295;dydt(iy) = dydt(iy) + work(iy,ij)
> &#4294967295; &#4294967295; &#4294967295; end do
> &#4294967295; &#4294967295; &#4294967295; return
> &#4294967295; &#4294967295; &#4294967295; end subroutine tend
>
> subroutine &#4294967295;int(jlat,x,w,idthr)
> &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> &#4294967295; &#4294967295; &#4294967295; dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or
> ys.
> .. other variables
> &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr)
>
> .. initialization of Fxs(:,:)
>
> &#4294967295; &#4294967295; &#4294967295; if (iplanback(idthr) .eq. 0 ) &#4294967295;then
> !$OMP Critical (createbackplans)
> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftback_plan(Fx,idthr)
> !$OMP End Critical (createbackplans)
> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanrueck(idthr) = 1
> &#4294967295; &#4294967295; &#4294967295; endif
> &#4294967295; &#4294967295; &#4294967295; call fftback( Fx, x, idthr)
> .. many more call fftback(Fx., x., idthr)
> .. then some calculations to xs
> &#4294967295; &#4294967295; &#4294967295; if (iplanforw(idthr) .eq. 0 ) &#4294967295;then
> !$OMP Critical (createforwplans)
> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftforw_plan(Fx,idthr)
> !$OMP End Critical (createforwplans)
> &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanforw(idthr) = 1
> &#4294967295; &#4294967295; &#4294967295; endif
> &#4294967295; &#4294967295; &#4294967295; call fftforw( y, Fy, idthr)
> .. many more call fftforw(Fy., y., idthr)
> &#4294967295; &#4294967295; &#4294967295; end subroutine int
>
> subroutine create_fftback_plan(Fx,idthr)
> &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f'
> &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz)
> .. other variables for using fftw
> &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr)
>
> .. some factorization to Fx and transmitting it to store
> &#4294967295; &#4294967295; &#4294967295; call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple,
> &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; store,inembed, istride, idist,store,onembed,
> &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; ostride, odist, FFTW_HC2R, FFTW_MEASURE)
> &#4294967295; &#4294967295; &#4294967295; end subroutine create_fftback_plan
>
> Similar to this is the subroutine create_fftforw_plan(Fx,idthr).
>
> subroutine fftback( Fx, x, idthr)
> &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f'
> &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz),
> &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; Yfftrueck(irie,lev)
> &#4294967295; &#4294967295; &#4294967295; integer*8 plan
> &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr)
> .. factorize the input and transmitting to store
> &#4294967295; &#4294967295; &#4294967295; call dfftw_execute_r2r(plan(3, idthr),store,store)
> .. factorize the output and transmitting to x
> &#4294967295; &#4294967295; &#4294967295; end subroutine fftback
>
> Again similar is subroutine fftforw( y, Fy, idthr).
>
> And now the following problem happens. If I run the program many times with
> only on thread, it always produces the same output. But when I start it
> with 2 or more threads it results of the fftw-routines differ not all of
> them but many. And because other calculations or done later, the error
> increases. If run the program with a rough resolution, the errors are not
> significant but with a fine resolution they do big trouble after few steps
> I get only Zeros &#4294967295;or NaNs. What I figgered out already before, I was doing
> the plannig for fftw before entering the parallel loop, but then I found
> out for example the results from 5 threads were fine (no difference) but
> for the other 3 threads the differed, thats why I planned for each thread
> separately.
> And now I have no more idea what I could or should do different.
>
> So if anybody can help me, I would appreciate it. And sorry for the long
> statement, I tried to put in all what could be of interest.
> Thanks Corinna

Have you tried using the parameter FFTW_THREADSAFE?

I think FFTW is not totally thread-safe. It uses some global data, and
I think the plan() uses accumulated 'wisdom'. That may be the source
of your problem, though I am not sure from the code. The parameter
FFTW_THREADSAFE should sove this.

You do know there are threaded versions of the FFTW API? These are the
same functions but with _thread appended. But these also are not
totally thread-safe, so you still have to use the flag.

Chris
==================
BORES Signal Processing
www.bores.com

Reply by Chris Bore ●November 29, 20102010-11-29

On Nov 27, 10:20&#4294967295;am, HardySpicer <gyansor...@gmail.com> wrote:
> On Nov 27, 4:40&#4294967295;am, "neophyte" <schuett@n_o_s_p_a_m.iap-kborn.de>
> wrote:
>
>
>
> > Hello,
>
> > after trying to find out why it doesn't work for several weeks, I finally
> > think I should try to find a solution with your help.
> > Here is my problem:
> > We calculate a climate modell, which consists of many array, the size of
> > the array depends on the resolution beeing choosed.
> > MAIN.f:
> > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> > &#4294967295; &#4294967295; &#4294967295; external tend
> > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION' &#4294967295;
> > .. more variables &#4294967295;
> > &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr)
> > &#4294967295; &#4294967295; &#4294967295; data iplanback/maxthr*0d0/,iplanforw/maxthr*0d0/
> > c
> > c++++ destroy plans for fft routines
> > &#4294967295; &#4294967295; &#4294967295; call clean_plan
>
> > .. a lot of reading and pre-calculations for startup
>
> > &#4294967295; &#4294967295; &#4294967295; time &#4294967295;= 0d0
> > &#4294967295; &#4294967295; &#4294967295; dtime = 0.2d0*step
> > &#4294967295; &#4294967295; &#4294967295; do time = 0, endtime
> > &#4294967295; &#4294967295; &#4294967295; call tendencies(time,ym2,dydt)
> > &#4294967295; &#4294967295; &#4294967295; time = time + dtime
> > &#4294967295; &#4294967295; &#4294967295; enddo
> > ..
>
> > subroutine tend(time,y,dydt)
> > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> > &#4294967295; &#4294967295; &#4294967295; external intend
> > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> > .. other variables
>
> > !$OMP PARALLEL PRIVATE(idthr,nthr,jlat,ij,iy,w1,w2,xij)
> > !$OMP& SHARED(createbackplans,createforwplans)
> > &#4294967295; &#4294967295; &#4294967295; nthr=OMP_GET_NUM_THREADS()
> > &#4294967295; &#4294967295; &#4294967295; idthr=OMP_GET_THREAD_NUM()+1
> > !$OMP SINGLE
> > &#4294967295; &#4294967295; &#4294967295; numthr=nthr
> > !$OMP END SINGLE
> > !$OMP DO
> > &#4294967295; &#4294967295; &#4294967295; do 80 ij = 1,kh
> > &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+1
> > &#4294967295; &#4294967295; &#4294967295; xij=x(ij)
> > &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w1,idthr)
> > &#4294967295; &#4294967295; &#4294967295; jlat = 2*(ij-1)+2
> > &#4294967295; &#4294967295; &#4294967295; xij=-x(ij) &#4294967295;
> > &#4294967295; &#4294967295; &#4294967295; call int(jlat,xij,w2,idthr)
> > &#4294967295; &#4294967295; &#4294967295; do iy = 1,ny
> > &#4294967295; &#4294967295; &#4294967295; work(iy,idthr) = work(iy,idthr)
> > &#4294967295; &#4294967295; &#4294967295;* &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; + w(ij)*(w1(iy)+w2(iy) )
> > &#4294967295; &#4294967295; &#4294967295; end do
> > 80 &#4294967295; &#4294967295;continue
> > !$OMP END DO
> > !$OMP END PARALLEL
> > &#4294967295; &#4294967295; &#4294967295; do ij=1,numthr
> > &#4294967295; &#4294967295; &#4294967295; do 60 iy = 1,ny
> > &#4294967295; 60 &#4294967295;dydt(iy) = dydt(iy) + work(iy,ij)
> > &#4294967295; &#4294967295; &#4294967295; end do
> > &#4294967295; &#4294967295; &#4294967295; return
> > &#4294967295; &#4294967295; &#4294967295; end subroutine tend
>
> > subroutine &#4294967295;int(jlat,x,w,idthr)
> > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> > &#4294967295; &#4294967295; &#4294967295; dimension Fx(lev,0:2*nz), .more Fxs or Fys., x(irie,lev), .more xs or
> > ys.
> > .. other variables
> > &#4294967295; &#4294967295; &#4294967295; common/coplan/iplanback(maxthr),iplanforw(maxthr)
>
> > .. initialization of Fxs(:,:)
>
> > &#4294967295; &#4294967295; &#4294967295; if (iplanback(idthr) .eq. 0 ) &#4294967295;then
> > !$OMP Critical (createbackplans)
> > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftback_plan(Fx,idthr)
> > !$OMP End Critical (createbackplans)
> > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanrueck(idthr) = 1
> > &#4294967295; &#4294967295; &#4294967295; endif
> > &#4294967295; &#4294967295; &#4294967295; call fftback( Fx, x, idthr)
> > .. many more call fftback(Fx., x., idthr)
> > .. then some calculations to xs
> > &#4294967295; &#4294967295; &#4294967295; if (iplanforw(idthr) .eq. 0 ) &#4294967295;then
> > !$OMP Critical (createforwplans)
> > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;call create_fftforw_plan(Fx,idthr)
> > !$OMP End Critical (createforwplans)
> > &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;iplanforw(idthr) = 1
> > &#4294967295; &#4294967295; &#4294967295; endif
> > &#4294967295; &#4294967295; &#4294967295; call fftforw( y, Fy, idthr)
> > .. many more call fftforw(Fy., y., idthr)
> > &#4294967295; &#4294967295; &#4294967295; end subroutine int
>
> > subroutine create_fftback_plan(Fx,idthr)
> > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> > &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f'
> > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> > &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz)
> > .. other variables for using fftw
> > &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr)
>
> > .. some factorization to Fx and transmitting it to store
> > &#4294967295; &#4294967295; &#4294967295; call dfftw_plan_many_r2r(plan(3, idthr),irank,irie,multiple,
> > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; store,inembed, istride, idist,store,onembed,
> > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; ostride, odist, FFTW_HC2R, FFTW_MEASURE)
> > &#4294967295; &#4294967295; &#4294967295; end subroutine create_fftback_plan
>
> > Similar to this is the subroutine create_fftforw_plan(Fx,idthr).
>
> > subroutine fftback( Fx, x, idthr)
> > &#4294967295; &#4294967295; &#4294967295; USE OMP_lib
> > &#4294967295; &#4294967295; &#4294967295; implicit double precision (a-h,o-z)
> > &#4294967295; &#4294967295; &#4294967295; INCLUDE '../include/fftw3.f'
> > &#4294967295; &#4294967295; &#4294967295; include 'RESOLUTION'
> > &#4294967295; &#4294967295; &#4294967295; double precision store(lev,1:irie),ffty(lev,0:2*nz),
> > &#4294967295; &#4294967295; &#4294967295;& &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; Yfftrueck(irie,lev)
> > &#4294967295; &#4294967295; &#4294967295; integer*8 plan
> > &#4294967295; &#4294967295; &#4294967295; common/plancs/plan(4,maxthr)
> > .. factorize the input and transmitting to store
> > &#4294967295; &#4294967295; &#4294967295; call dfftw_execute_r2r(plan(3, idthr),store,store)
> > .. factorize the output and transmitting to x
> > &#4294967295; &#4294967295; &#4294967295; end subroutine fftback
>
> > Again similar is subroutine fftforw( y, Fy, idthr).
>
> > And now the following problem happens. If I run the program many times with
> > only on thread, it always produces the same output. But when I start it
> > with 2 or more threads it results of the fftw-routines differ not all of
> > them but many. And because other calculations or done later, the error
> > increases. If run the program with a rough resolution, the errors are not
> > significant but with a fine resolution they do big trouble after few steps
> > I get only Zeros &#4294967295;or NaNs. What I figgered out already before, I was doing
> > the plannig for fftw before entering the parallel loop, but then I found
> > out for example the results from 5 threads were fine (no difference) but
> > for the other 3 threads the differed, thats why I planned for each thread
> > separately.
> > And now I have no more idea what I could or should do different.
>
> > So if anybody can help me, I would appreciate it. And sorry for the long
> > statement, I tried to put in all what could be of interest.
> > Thanks Corinna
>
> I once used Fortran - about 30 odd years ago! Bloody hell man - get
> with it!
> Climate model? I assume it must be BS in thaty case.
>
> Hardy

FORTRAN is still very commonly used for physics, and for writing C
compilers. :-)

Chris
===========================
Chris Bore
BORES SIgnal Processing
www.bores.com

Reply by neophyte ●December 2, 20102010-12-02

>
>Have you tried using the parameter FFTW_THREADSAFE?
>
>I think FFTW is not totally thread-safe. It uses some global data, and
>I think the plan() uses accumulated 'wisdom'. That may be the source
>of your problem, though I am not sure from the code. The parameter
>FFTW_THREADSAFE should sove this.
>
>You do know there are threaded versions of the FFTW API? These are the
>same functions but with _thread appended. But these also are not
>totally thread-safe, so you still have to use the flag.
>
>Chris
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>BORES Signal Processing
>www.bores.com
>
>
Hello Chris,

thanks for your response, I tried FFTW_THREADSAFE. But unfortunately it
didn't help. First I thought so, because in the the first and second time
step no error occured, but then in the third step already the first result
differed from the expected one. At the moment I don't know what else I
could try?
But thanks anyway
Corinna

fftw+openmp in Fortran

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group