comp.dsp | Separate two sinusoids with very close frequencies (5 Hz difference) in an audio signal, with good time-domain resolution

I have an audio signal which is a mix of: 

* a 2000 Hz sinusoid, beginning at 1.00 sec,  without fade-in/fade-out

* a 2005 Hz sinusoid, beginning at 1.031 sec, with a slow fade-out at the end

* background noise

(link to the WAV file: https://file.io/jo0h9V)

(In reality it's even more complex: the sinusoids can also vary in amplitude...)

Question: How to separate the signal into two signals (the two sinusoids), with a good time-domain resolution?  (no leakage in the time domain)

##Attempt 1

I zero-padded the signal to the next power of 2 (final length: 524288), did a real-FFT. 

The size of the real-FFT vector `h` is 262145 frequency bins to cover the frequency range [0, 22050hz], so each bin has a width 0.084 Hz. Pretty good news, we can easily distinguish the two sinusoids with this!

Now we can isolate the two sinusoids with:

    h1 = h.copy()   # the real-FFT
    h1[:23750] = 0  # zeroing bins outside [23750, 23800]
    h1[23800:] = 0
    x1 = irfft(h1)  # inverse real-FFT

    h2 = h.copy() 
    h2[:23810] = 0  # zeroing bins outside [23810, 23860]
    h2[23860:] = 0
    x2 = irfft(h1)  # inverse real-FFT

It works, but the time-domain resolution is very bad (it's normal because the frequency resolution is very high!) : the sine is very poorly localized in time-domain. 

=> Problem: instead of a fast attack in the separated sinusoid, we have a slow fade-in...

Of course, zeroing bins outside of the frequency range of interest (bins [23750:23800]) is not optimal, I should have used a (non-rectangular) window in the frequency domain instead:

    h1[:23750] = 0 
    h1[:23800] = 0
    h1[23750:23800] *= window

But even with such a windowing I doubt I can avoid a slow time-domain resolution after the separation.

##Attempt 2

Use a STFT instead of a global FFT of the signal. This helps to localize, but ... in order to have a good frequency resolution to be able to separate the two sinusoids, we have to take a big FFTSIZE, such as 16384. Then each of the 
8193 frequency bins (real-FFT) will have a 2.7 Hz width!
Not enough to distinguish or separate the two sinusoids that have only 5 Hz of difference... So this approach will fail.

I know this is probably an example of the time-frequency trade off / uncertainty principle, but in this precise case, is there something else we can do to improve the separation?

Reply by Steve Pope ●September 18, 20182018-09-18

Random6wx3e3qvwp  <xyzprod123@gmail.com> wrote:

>I have an audio signal which is a mix of: 
>
>* a 2000 Hz sinusoid, beginning at 1.00 sec,  without fade-in/fade-out
>
>* a 2005 Hz sinusoid, beginning at 1.031 sec, with a slow fade-out at the end

>Question: How to separate the signal into two signals (the two
>sinusoids), with a good time-domain resolution?  

So you have six full cycles of sinusoid A before sinusoid B
starts. This is good.

I would process this in the time domain.  During those first
six cycles, I would measure the amplitude and phase of signal A 
(which for medium or high SNR can be done accurately).

I would then measure the instantaneous phase of the signal,
relative to the above measurement, from 1.031 seconds onward.  
From this you can back out at least an approximation to the 
amplitude-varying signal B, and subtract it off.

The approximation sin(A+B) = sin(A) + cos(A)sin(B) for
small B may be useful.  

The problem is that the magnitude of B is modulated 
so you will probably need some heuristics to recover
both the phase and the envelope of B.  Perhaps a sufficienct 
estimate of B's phase and maximum magnitude could be formed 
from a truncation of the above-formed instantaneous 
phase signal to the peak region (the region where the
instantaneous phase most deviates from the measured phase
of A.)

If you know the initial phase of A and B that simplifies
the problem.

Steve

Reply by ●October 13, 20182018-10-13

On Tuesday, September 18, 2018 at 1:00:42 PM UTC-7, Random6wx3e3qvwp wrote:
> I have an audio signal which is a mix of: 
> 
> * a 2000 Hz sinusoid, beginning at 1.00 sec,  without fade-in/fade-out
> 
> * a 2005 Hz sinusoid, beginning at 1.031 sec, with a slow fade-out at the end
> 
> * background noise
> 
> (link to the WAV file: https://file.io/jo0h9V)
> 
> (In reality it's even more complex: the sinusoids can also vary in amplitude...)
> 
> Question: How to separate the signal into two signals (the two sinusoids),
> with a good time-domain resolution?  (no leakage in the time domain)

(snip)

> I know this is probably an example of the time-frequency trade off /
> uncertainty principle, but in this precise case, is there something
> else we can do to improve the separation?

If you know the exact parameters of the data, then it is easy.

But generally, you don't know that.

If you know that the data is modulated 2000Hz and 2005Hz sinusoids,
do a least squares fit to some combination of such sines and
cosines.  (Cosines, if you don't know the phase.)

Fit to f(t)=A*sin(4000*pi*t) + B*cos(4000*pi*t) + 
            C*sin(4010*pi*t) + D*cos(4010*pi*t) +
            E*t*sin(4010*pi*t) + F*t*cos(4010*pi*t)

you can add more terms if you find them useful.

The E and F terms allow for a linear envelope on the 2005Hz,
you could have higher powers, too.  You have plenty of data,
so you could fit many more parameters.

This forces the calculation to use only these terms, where
the FFT allows for all the frequencies, and then you force
to zero ones that you don't want to know about.  

In general when doing fitting, you have to be careful what
you force, and when.

Reply by ●October 14, 20182018-10-14

On Sunday, October 14, 2018 at 2:47:38 AM UTC+13, gah...@gmail.com wrote:
> On Tuesday, September 18, 2018 at 1:00:42 PM UTC-7, Random6wx3e3qvwp wrote:
> > I have an audio signal which is a mix of: 
> > 
> > * a 2000 Hz sinusoid, beginning at 1.00 sec,  without fade-in/fade-out
> > 
> > * a 2005 Hz sinusoid, beginning at 1.031 sec, with a slow fade-out at the end
> > 
> > * background noise
> > 
> > (link to the WAV file: https://file.io/jo0h9V)
> > 
> > (In reality it's even more complex: the sinusoids can also vary in amplitude...)
> > 
> > Question: How to separate the signal into two signals (the two sinusoids),
> > with a good time-domain resolution?  (no leakage in the time domain)
> 
> (snip)
> 
> > I know this is probably an example of the time-frequency trade off /
> > uncertainty principle, but in this precise case, is there something
> > else we can do to improve the separation?
> 
> If you know the exact parameters of the data, then it is easy.
> 
> But generally, you don't know that.
> 
> If you know that the data is modulated 2000Hz and 2005Hz sinusoids,
> do a least squares fit to some combination of such sines and
> cosines.  (Cosines, if you don't know the phase.)
> 
> Fit to f(t)=A*sin(4000*pi*t) + B*cos(4000*pi*t) + 
>             C*sin(4010*pi*t) + D*cos(4010*pi*t) +
>             E*t*sin(4010*pi*t) + F*t*cos(4010*pi*t)
> 
> 
> you can add more terms if you find them useful.
> 
> 
> The E and F terms allow for a linear envelope on the 2005Hz,
> you could have higher powers, too.  You have plenty of data,
> so you could fit many more parameters.
> 
> 
> This forces the calculation to use only these terms, where
> the FFT allows for all the frequencies, and then you force
> to zero ones that you don't want to know about.  
> 
> 
> In general when doing fitting, you have to be careful what
> you force, and when.

Parametric modeling via an AR model may work.

Separate two sinusoids with very close frequencies (5 Hz difference) in an audio signal, with good time-domain resolution

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group