Separate two sinusoids with very close frequencies (5 Hz difference) in an audio signal, with good time-domain resolution

Started by Random6wx3e3qvwp September 18, 2018
I have an audio signal which is a mix of: 

* a 2000 Hz sinusoid, beginning at 1.00 sec,  without fade-in/fade-out

* a 2005 Hz sinusoid, beginning at 1.031 sec, with a slow fade-out at the end

* background noise

(link to the WAV file: https://file.io/jo0h9V)

(In reality it's even more complex: the sinusoids can also vary in amplitude...)

Question: How to separate the signal into two signals (the two sinusoids), with a
good time-domain resolution?  (no leakage in the time domain)

##Attempt 1

I zero-padded the signal to the next power of 2 (final length: 524288), did a
real-FFT. 

The size of the real-FFT vector `h` is 262145 frequency bins to cover the frequency
range [0, 22050hz], so each bin has a width 0.084 Hz. Pretty good news, we can
easily distinguish the two sinusoids with this!

Now we can isolate the two sinusoids with:

    h1 = h.copy()   # the real-FFT
    h1[:23750] = 0  # zeroing bins outside [23750, 23800]
    h1[23800:] = 0
    x1 = irfft(h1)  # inverse real-FFT

    h2 = h.copy() 
    h2[:23810] = 0  # zeroing bins outside [23810, 23860]
    h2[23860:] = 0
    x2 = irfft(h1)  # inverse real-FFT

It works, but the time-domain resolution is very bad (it's normal because the
frequency resolution is very high!) : the sine is very poorly localized in
time-domain. 

=> Problem: instead of a fast attack in the separated sinusoid, we have a slow
fade-in...

Of course, zeroing bins outside of the frequency range of interest (bins
[23750:23800]) is not optimal, I should have used a (non-rectangular) window in the
frequency domain instead:

    h1[:23750] = 0 
    h1[:23800] = 0
    h1[23750:23800] *= window

But even with such a windowing I doubt I can avoid a slow time-domain resolution
after the separation.

##Attempt 2

Use a STFT instead of a global FFT of the signal. This helps to localize, but ... in
order to have a good frequency resolution to be able to separate the two sinusoids,
we have to take a big FFTSIZE, such as 16384. Then each of the 
8193 frequency bins (real-FFT) will have a 2.7 Hz width!
Not enough to distinguish or separate the two sinusoids that have only 5 Hz of
difference... So this approach will fail.

I know this is probably an example of the time-frequency trade off / uncertainty
principle, but in this precise case, is there something else we can do to improve
the separation?
Random6wx3e3qvwp  <xyzprod123@gmail.com> wrote:

>I have an audio signal which is a mix of: > >* a 2000 Hz sinusoid, beginning at 1.00 sec, without fade-in/fade-out > >* a 2005 Hz sinusoid, beginning at 1.031 sec, with a slow fade-out at the end
>Question: How to separate the signal into two signals (the two >sinusoids), with a good time-domain resolution?
So you have six full cycles of sinusoid A before sinusoid B starts. This is good. I would process this in the time domain. During those first six cycles, I would measure the amplitude and phase of signal A (which for medium or high SNR can be done accurately). I would then measure the instantaneous phase of the signal, relative to the above measurement, from 1.031 seconds onward. From this you can back out at least an approximation to the amplitude-varying signal B, and subtract it off. The approximation sin(A+B) = sin(A) + cos(A)sin(B) for small B may be useful. The problem is that the magnitude of B is modulated so you will probably need some heuristics to recover both the phase and the envelope of B. Perhaps a sufficienct estimate of B's phase and maximum magnitude could be formed from a truncation of the above-formed instantaneous phase signal to the peak region (the region where the instantaneous phase most deviates from the measured phase of A.) If you know the initial phase of A and B that simplifies the problem. Steve
On Tuesday, September 18, 2018 at 1:00:42 PM UTC-7, Random6wx3e3qvwp wrote:
> I have an audio signal which is a mix of: > > * a 2000 Hz sinusoid, beginning at 1.00 sec, without fade-in/fade-out > > * a 2005 Hz sinusoid, beginning at 1.031 sec, with a slow fade-out at the end > > * background noise > > (link to the WAV file: https://file.io/jo0h9V) > > (In reality it's even more complex: the sinusoids can also vary in amplitude...) > > Question: How to separate the signal into two signals (the two sinusoids), > with a good time-domain resolution? (no leakage in the time domain)
(snip)
> I know this is probably an example of the time-frequency trade off / > uncertainty principle, but in this precise case, is there something > else we can do to improve the separation?
If you know the exact parameters of the data, then it is easy. But generally, you don't know that. If you know that the data is modulated 2000Hz and 2005Hz sinusoids, do a least squares fit to some combination of such sines and cosines. (Cosines, if you don't know the phase.) Fit to f(t)=A*sin(4000*pi*t) + B*cos(4000*pi*t) + C*sin(4010*pi*t) + D*cos(4010*pi*t) + E*t*sin(4010*pi*t) + F*t*cos(4010*pi*t) you can add more terms if you find them useful. The E and F terms allow for a linear envelope on the 2005Hz, you could have higher powers, too. You have plenty of data, so you could fit many more parameters. This forces the calculation to use only these terms, where the FFT allows for all the frequencies, and then you force to zero ones that you don't want to know about. In general when doing fitting, you have to be careful what you force, and when.
On Sunday, October 14, 2018 at 2:47:38 AM UTC+13, gah...@gmail.com wrote:
> On Tuesday, September 18, 2018 at 1:00:42 PM UTC-7, Random6wx3e3qvwp wrote: > > I have an audio signal which is a mix of: > > > > * a 2000 Hz sinusoid, beginning at 1.00 sec, without fade-in/fade-out > > > > * a 2005 Hz sinusoid, beginning at 1.031 sec, with a slow fade-out at the end > > > > * background noise > > > > (link to the WAV file: https://file.io/jo0h9V) > > > > (In reality it's even more complex: the sinusoids can also vary in
amplitude...)
> > > > Question: How to separate the signal into two signals (the two sinusoids), > > with a good time-domain resolution? (no leakage in the time domain) > > (snip) > > > I know this is probably an example of the time-frequency trade off / > > uncertainty principle, but in this precise case, is there something > > else we can do to improve the separation? > > If you know the exact parameters of the data, then it is easy. > > But generally, you don't know that. > > If you know that the data is modulated 2000Hz and 2005Hz sinusoids, > do a least squares fit to some combination of such sines and > cosines. (Cosines, if you don't know the phase.) > > Fit to f(t)=A*sin(4000*pi*t) + B*cos(4000*pi*t) + > C*sin(4010*pi*t) + D*cos(4010*pi*t) + > E*t*sin(4010*pi*t) + F*t*cos(4010*pi*t) > > > you can add more terms if you find them useful. > > > The E and F terms allow for a linear envelope on the 2005Hz, > you could have higher powers, too. You have plenty of data, > so you could fit many more parameters. > > > This forces the calculation to use only these terms, where > the FFT allows for all the frequencies, and then you force > to zero ones that you don't want to know about. > > > In general when doing fitting, you have to be careful what > you force, and when.
Parametric modeling via an AR model may work.