On 6/17/15 4:08 PM, Marcel Mueller wrote:
> On 17.06.15 01.49, scorprulebad wrote:
> [...]
>> In the next step I wanted to convolve each block with a slighty different
>> impulse response. This led to a crackling noise between the blocks when
>> the impulse response changed.
>
> Obvoiusly.
>

non-TI system with a slow sampling rate and a zero-order hold on the 
changing parameters that generate the changing impulse response.

>> After reading some tutorials on this website
>> i found out that i have to apply a window and let the the convolved
>> blocks
>> overlap respectively get some redundancy in the audio blocks.
>
> Fading over with a window function might work. It will fairly well
> compensate for changes of the amplitude of certain frequencies in the
> impulse response. But it will not adequately compensate for phase
> changes. They will cause some frequencies to cancel if done too fast.
>
>> I didn't get
>> how to do it exactly. Let us consider we want to reach on overlap of
>> 50% (
>> hop size 513/2 samples?) to use for example the hamming window.
>
> No, this window does not come to zero.

interesting he would choose a Hamming window for this.  the sidelobes 
die reasonably fast, but it ain't continuous.

overlap-add (OLA) and overlap-save (OLS, a.k.a. "overlap-scrap") need 
only have a rectangular window for the case of constant FIR 
coefficients.  (my personal doctrine is that there is no window with 
OLS, but there clearly is with OLA because of the zero-padding of x[n]) 
  in the case of time-changing coefficients, OLA will crossfade from one 
FIR to the next, whereas OLS will have a jump discontinuity.

OLA is used also with non-LTI stuff like the phase-vocoder.  for a 
constant-coefficient FIR this would be a little overkill, but this might 
be what you want for a time-varying FIR with frame-by-frame convolution: 
your data need not be rectangularly windowed.  but you still must 
satisfy complementarity.

     +inf
     SUM{ w[n-k*H] }  =  1      for all integer n
     k=-inf

where H is the frame hop or frame stride in samples.  for an FFT size of 
N and for 50% overlap, H = N/2.  for 75% overlap H = N/4.  then split 
your input into frames that are

     +inf
     SUM{ w[n-k*H] * x[n] }  =  x[n]
     k=-inf

     +inf
     SUM{ x_k[n] }  =  x[n]
     k=-inf

     x_k[n] = w[n-k*H] * x[n]

and frame output is

     y_k[n] = x_k[n] (*) h[n]

where (*) means linear convolution.  the linear convolution can be done 
fast with the FFT if the non-zero length of h[n] plus the non-zero 
length of x_k[n] (which is the length of w[n]).  the length of w[n] 
might be 2*H and if L is the longest non-zero length of h[n], then

    N = 2*H + L - 1

that tells you how long your FIR is allowed to be.  make sure that the 
spectrum that you're multiplying with

    H[k] = DFT{ h[n] }

*is* the DFT of only an h[n] that is no longer than L samples.

use a **Hann** window (sometimes called "hanning" even though there was 
no Dr. von Hanning to name the window after, and because of that silly 
historical and semantic screwup among the early DSPers it has even been 
conflated with the Hamming window which is very similar: "Hann window 
with on a platform"), not the Hamming.  any window that is continuous 
and complementary can be tried out for this.

another window (of non-zero width of 2H):

   w[n] =  9/16*cos(pi*n/H) - 1/16*cos(3*pi*n/H) + 1/2    for  |n| < 2*H

there are others to try, but i would start with the Hann

   w[n] =  1/2*cos(pi*n/H) + 1/2    for  |n| < 2*H,  0 otherwise

the total output is

     +inf
     SUM{ y_k[n] }  =  y[n]
     k=-inf

if you do that and get crackling, i would bet that either you're doing 
something wrong (not as per above) or you're modulating h[n] so wildly 
that no implementation will save your ass (or your ears).

> What you do is in fact fading
> over from one sample set to another one. Try the first quarter of a
> sine/cosine wave cycle to control the fade over. If you use 50% overlap
> it is quite simple: apply a half sine window to each block and then add
> the overlap.

i would call that half-sine window a Hann window.  it needs to fade up 
just as it needs to fade out.  those are the two halves of the Hann.

>> Is it correct that every block needs to contain now 50%of the samples oft
>> previous block?
>
> Yes.
>

yes if 2*H = N

or more, depending on your overlap.  if 75% overlap, you need 75% of the 
samples from the previous frame.

>> Where do i have to apply the window?

for this linear, but time-varying case, i think only on the input x[n]. 
  *not* additionally on the output frames y_k[n].

>> Do I have to apply it before the fft
>> convolution on the audio signal blocks (windowsize : 513 samples) or on
>> the ifft output (windowsize 1024: samples)?
>
> Usually after that.

Nooooo, i dispute that, Marcel.   apply the window only on x[n].

> But it might work the other way around as well.
> (Didn't test)
>
>> And how many samples do I need to slide the fft output signal on the
>> timescale with 50% overlap?
>
> ???
> No idea what you want to slide.

i think he means slide it by N/2 for 50% overlap and i would add to 
slide the output by N/4 for 75% overlap.

or perhaps he's asking about the degree or rate of modulation of h[n]. 
i dunno.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

On 17.06.15 01.49, scorprulebad wrote:
[...]
> In the next step I wanted to convolve each block with a slighty different
> impulse response. This led to a crackling noise between the blocks when
> the impulse response changed.

Obvoiusly.

> After reading some tutorials on this website
> i found out that i have to apply a window and let the the convolved blocks
> overlap respectively get some redundancy in the audio blocks.

Fading over with a window function might work. It will fairly well 
compensate for changes of the amplitude of certain frequencies in the 
impulse response. But it will not adequately compensate for phase 
changes. They will cause some frequencies to cancel if done too fast.

> I didn't get
> how to do it exactly. Let us consider we want to reach on overlap of 50% (
> hop size 513/2 samples?)  to use for example the hamming window.

No, this window does not come to zero. What you do is in fact fading 
over from one sample set to another one. Try the first quarter of a 
sine/cosine wave cycle to control the fade over. If you use 50% overlap 
it is quite simple: apply a half sine window to each block and then add 
the overlap.

> Is it correct that every block needs to contain now 50%of the samples oft
> previous block?

Yes.

> Where do i have to apply the window? Do I have to apply it before the fft
> convolution on the audio signal blocks (windowsize : 513 samples) or on
> the ifft output (windowsize 1024: samples)?

Usually after that. But it might work the other way around as well. 
(Didn't test)

> And how many samples do I need to slide the fft output signal on the
> timescale with 50% overlap?

???
No idea what you want to slide.

Marcel

Here you can see the python code of the fft convolution algorithm without
window applied and hopsize = 0:

# set iteration counter to 0
blocknumber = 0 
# read in audio file
_, audiodata = scipy.io.wavefile.read("filename_audio_wave")
_, hrtf_block =  scipy.io.wavefile.read("filename_hrtf_wave")
while blocknumber <5:

    # set blocksizes
    fft_blocksize = 1024
    audio_blocksize = 513
    hrtf_blocksize = 512
    binaural = np.zeros((fft_blocksize*5, ), dtype=np.int16)

    # Do zeropadding: zeropad hrtf and audio
    hrtf_block_zeropadded = np.zeros((fft_blocksize, ), dtype = 'int16')
    hrtf_block_zeropadded[0:hrtf_blocksize, ] = hrtf_block
    sp_block_sp_zeropadded = np.zeros((fft_blocksize, ), dtype = 'int16')
    sp_block_sp_zeropadded[0:sp_blocksize, ] =
audiodata[blocknumber*audio_blocksize : (blocknumber+1)*audio_blocksize,
]

    # bring time domain input to to frequency domain
    hrtf_block_fft = fft(hrtf_block_zeropadded, fft_blocksize)
    audio_block_fft = fft(audio_block_zeropadded, fft_blocksize)

    binaural_block_frequency = hrtf_block_fft * audio_block_fft
    binaural_block = ifft(binaural_block_frequency, fft_blocksize).real

    # add the block to the other blocks
    slide_forward_samples = 513
    binaural[blocknumber*slide_forward_samples :
blocknumber*slide_forward_samples+fft_blocksize, ] += binaural_block
    blocknumber+=1
---------------------------------------
Posted through http://www.DSPRelated.com

Hello

I am new to this forum so at first I want to say hello to everyone :)

I am trying to make a fast fft convolution (FFT_Blocksize=1024 samples) of
an headpone related impulse response (L=512 samples) with an sine wave
audio signal. Here you can see the plots of the time Signals:

impulse response:
http://fs2.directupload.net/images/150617/fc9j6cs7.png
audio signal block:
http://fs1.directupload.net/images/150617/l8hcvl7q.png


For the fast convolution I splitted the wave audio signal in blocks with
blocksize M=513 samples to reach the fft convolution criterion L+M =
Blocksize+1. Then I zeroppaded each wave block and the hrtf to 1024
samples, applied the fft and made the multiplication and applied the ifft.
You can see the result of one block in the following picture:
http://fs1.directupload.net/images/150617/bxoe9fkm.png

After this I joined the blocks by sliding each block 513 samples on the
time scale further than the last block (Hop Size = 0) and added the
samples. This worked without problems and good audio quality.

In the next step I wanted to convolve each block with a slighty different
impulse response. This led to a crackling noise between the blocks when
the impulse response changed. After reading some tutorials on this website
i found out that i have to apply a window and let the the convolved blocks
overlap respectively get some redundancy in the audio blocks. I didn't get
how to do it exactly. Let us consider we want to reach on overlap of 50% (
hop size 513/2 samples?)  to use for example the hamming window.

My Questions are:

Is it correct that every block needs to contain now 50%of the samples oft
previous block?

Where do i have to apply the window? Do I have to apply it before the fft
convolution on the audio signal blocks (windowsize : 513 samples) or on
the ifft output (windowsize 1024: samples)?

And how many samples do I need to slide the fft output signal on the
timescale with 50% overlap?


---------------------------------------
Posted through http://www.DSPRelated.com