Hi,
I'm currently trying to implement the Overlap Save / Scrap method for educational purposes using KFR and C++.
However, I'm noticing that something is off when the FFT / block size is greater than the signal length. I have to admit that this is not really a real-life scenario (as you'd usually use either direct or standard FFT convolution), but I'd like to understand WHY this happens.
I realised this also happens for other choice of parameters... See below
From my understanding, Overlap Save / Scrap is always valid and should inherently provide the necessary padding, even when the signal is shorter than the FFT Size
void overlapSaveConv(
float const * data,
float* out,
float const * filter,
size_t nSamples,
size_t filterLength,
size_t FFTSize
) {
if(FFTSize < (filterLength + 1)) throw std::invalid_argument("FFT Size must be at least one sample larger than filter length!");
//N = L + M - 1
//N = L + filterLength - 1
//=> L = N - M + 1
const size_t N = FFTSize;
const size_t M = filterLength;
const size_t step_size = N - (M - 1);
//We need to zero pad the filter
std::vector<float> paddedFilter; paddedFilter.resize(FFTSize);
std::memcpy(paddedFilter.data(), filter, sizeof(float) * filterLength);
//Compute Fourier Transform of filter, that will be used a lot of times in the loop
auto H = kfr::realdft(kfr::make_univector(paddedFilter));
//As far as I understand, it is not (easily) possible to implement overlap save in place
//This is due to the fact that we prepend M-1 / overlap zeros and then go stepsize further
//However, if we go stepsize further, the beginning of the data we need has already been overwritten by the previous block
//results (due to the offset from the zeros)
//Two possibilities:
// - Copy the input array
// - Separate buffer for the output array
//As copying the input array allows us to get rid of logic inside the loop
//(recognise whether we need to prepend or append zeros), this is our choice.
//We need to prepend and append zeros to our data to account for transients (prepend) and the block size (append)
std::vector<float> paddedData;
paddedData.resize(
M - 1 + //Prepend M-1 (filterLength - 1) zeros at the beginning
nSamples + //Space for the actual samples
FFTSize //Append zeros at the end so we don't need to worry about in the loop
);
std::memcpy(paddedData.data() + (M - 1), data, sizeof(float)*nSamples);
size_t i = 0;
while(i < nSamples) {
//Gather the data of our current block and perform an FFT
auto xSamplesUnivector = kfr::make_univector(paddedData.data() + i, FFTSize);
auto fftx = kfr::realdft(xSamplesUnivector);
//Apply the filter to our current block
fftx = fftx * H;
//Transform back to the time domain and apply proper scaling!
auto yt = kfr::irealdft(fftx);
yt /= double(yt.size());
//Copy our result to the output
std::memcpy(
out + i,
yt.data() + (M - 1), //Discard filterLength - 1 (M - 1) samples from the beginning
std::min( //Make sure we don't write too many samples (as the last block is zero padded / has zeros appended, but the output target is shorter)
(N - (M - 1)),
nSamples - i + 1
) * sizeof(float)
);
i += step_size;
}
}
See this example (Signal Length = 2^9 = 512, chosen FFT Block Size = 1024, filter length = 150 samples (Zero everywhere except unit value at 20th and 30th sample):

Signal Length = 2^12 (=> 4096), FFT Block Size = 1024, file length = 100 samples (Zero everywhere except unit value at 20th and 30th sample):

Update:
I have been able to partially identify the issue, it must be related to KFR.
When replacing the real dft with a standard complex dsp (minimally invasive), everything works just fine.
It seems like there is an issue when multiplying two real DFTs in KFR and transforming them back.
If you feed a complex DFT with the same data (numbers, not bit representation, of course), there are no issues...






