DSPRelated.com
Forums

Overlap Save Implementation Issues

Started by MellanoxRoxx 3 years ago1 replylatest reply 3 years ago149 views

Hi,

I'm currently trying to implement the Overlap Save / Scrap method for educational purposes using KFR and C++.

However, I'm noticing that something is off when the FFT / block size is greater than the signal length. I have to admit that this is not really a real-life scenario (as you'd usually use either direct or standard FFT convolution), but I'd like to understand WHY this happens.

I realised this also happens for other choice of parameters... See below

From my understanding, Overlap Save / Scrap is always valid and should inherently provide the necessary padding, even when the signal is shorter than the FFT Size

void overlapSaveConv(
    float const * data,
    float* out,
    float const * filter,
    size_t nSamples,
    size_t filterLength,
    size_t FFTSize
) {


    if(FFTSize < (filterLength + 1)) throw std::invalid_argument("FFT Size must be at least one sample larger than filter length!");

    //N = L + M - 1
    //N = L + filterLength - 1
    //=> L = N - M + 1

    const size_t N = FFTSize;
    const size_t M = filterLength;
    const size_t step_size = N - (M - 1);

    //We need to zero pad the filter
    std::vector<float> paddedFilter; paddedFilter.resize(FFTSize);
    std::memcpy(paddedFilter.data(), filter, sizeof(float) * filterLength);
    //Compute Fourier Transform of filter, that will be used a lot of times in the loop
    auto H = kfr::realdft(kfr::make_univector(paddedFilter));

    //As far as I understand, it is not (easily) possible to implement overlap save in place
    //This is due to the fact that we prepend M-1 / overlap zeros and then go stepsize further
    //However, if we go stepsize further, the beginning of the data we need has already been overwritten by the previous block
    //results (due to the offset from the zeros)

    //Two possibilities:
    // - Copy the input array
    // - Separate buffer for the output array

    //As copying the input array allows us to get rid of logic inside the loop
    //(recognise whether we need to prepend or append zeros), this is our choice.

    //We need to prepend and append zeros to our data to account for transients (prepend) and the block size (append)
    std::vector<float> paddedData;
    paddedData.resize(
        M - 1 + //Prepend M-1 (filterLength - 1) zeros at the beginning
        nSamples + //Space for the actual samples
        FFTSize //Append zeros at the end so we don't need to worry about in the loop
    );
    std::memcpy(paddedData.data() + (M - 1), data, sizeof(float)*nSamples);


    size_t i = 0;
    while(i < nSamples) {

        //Gather the data of our current block and perform an FFT
        auto xSamplesUnivector = kfr::make_univector(paddedData.data() + i, FFTSize);
        auto fftx = kfr::realdft(xSamplesUnivector);

        //Apply the filter to our current block
        fftx = fftx * H;

        //Transform back to the time domain and apply proper scaling!
        auto yt = kfr::irealdft(fftx);
        yt /= double(yt.size());

        //Copy our result to the output
        std::memcpy(
            out + i,
            yt.data() + (M - 1), //Discard filterLength - 1 (M - 1) samples from the beginning
            std::min( //Make sure we don't write too many samples (as the last block is zero padded / has zeros appended, but the output target is shorter)
                (N - (M - 1)),
                nSamples - i + 1
            ) * sizeof(float)
        );

        i += step_size;
    }
}

See this example (Signal Length = 2^9 = 512, chosen FFT Block Size = 1024, filter length = 150 samples (Zero everywhere except unit value at 20th and 30th sample):

olsplot_39015.png

Signal Length = 2^12 (=> 4096), FFT Block Size = 1024, file length = 100 samples (Zero everywhere except unit value at 20th and 30th sample):

olsplot2_58062.png

[ - ]
Reply by MellanoxRoxxAugust 3, 2021

Update:

I have been able to partially identify the issue, it must be related to KFR.

When replacing the real dft with a standard complex dsp (minimally invasive), everything works just fine.

It seems like there is an issue when multiplying two real DFTs in KFR and transforming them back.

If you feed a complex DFT with the same data (numbers, not bit representation, of course), there are no issues...