DSPRelated.com
Forums

Question about source separation

Started by Lars Hansen November 9, 2005
Hi

I am looking for some online litterature about the following:

How do I estimate the speech signals s1 and s2 based on the observation:

y[k]=c1*s1[k]+c2*s2[k]

where c1 and c2 are constants.

I have searched for some matlab algorithms about under-complete source 
separation (when there are more sources than observations) but I couldn't 
find anything..

Hope you guys can help ?

Thanks...



Lars Hansen wrote:
> Hi > > I am looking for some online litterature about the following: > > How do I estimate the speech signals s1 and s2 based on the observation: > > y[k]=c1*s1[k]+c2*s2[k] > > where c1 and c2 are constants. > > I have searched for some matlab algorithms about under-complete source > separation (when there are more sources than observations) but I couldn't > find anything.. > > Hope you guys can help ? > > Thanks... > > >
The general problem is ill posed. Maybe this will help: A signal subspace approach for speech enhancement Ephraim, Y.; Van Trees, H.L.; Speech and Audio Processing, IEEE Transactions on Volume 3, Issue 4, July 1995 Page(s):251 - 266
Lars,

Search internet for

     cochannel speaker separation

 or

     cochannel voice separation

Good Luck,

Dirk


Lars Hansen wrote:
> Hi > > I am looking for some online litterature about the following: > > How do I estimate the speech signals s1 and s2 based on the observation: > > y[k]=c1*s1[k]+c2*s2[k]
> > where c1 and c2 are constants. > > I have searched for some matlab algorithms about under-complete source > separation (when there are more sources than observations) but I couldn't > find anything.. > > Hope you guys can help ? > > Thanks...
That's actually not too difficult. Normally c1 and c2 are z-transfer
functions with delays etc. They are non-min phase of course.

Naebad

ok...where do I find the algorithm that solves this problem? (in matlab 
plz) -- thanks


"naebad" <minnaebad@yahoo.co.uk> skrev i en meddelelse 
news:1131568578.084152.48070@o13g2000cwo.googlegroups.com...
> That's actually not too difficult. Normally c1 and c2 are z-transfer > functions with delays etc. They are non-min phase of course. > > Naebad >
naebad,

'not too difficult' is not true in the applications I have seen: a)
multiple people engaged in multiple conversations in approximately the
same location (bar, restaurant), when you only want to hear one of them
or maybe several of them separately but only have a single microphone
or single channel recording. b) multiple people using the same
communications channel for possibly different conversations with a
distant sensor picking them both up on the same channel. c) recordings
of telepone calls with people simultaneously talking in the background.
Unfortunately what we see often on 'CSI: Miami' is pure fantasy.

People have been working on this problem at least as far back as the
early 80's (probably longer).  I followed what was being done in
certain areas from then until about 1992; little was accomplished.
There was early work done using pitch estimation of both speakers,
pitch tracking for each speaker (pitch tracks normally cross
BTW),spectral separation and then reconstruction. Outside of a lab
users did not seem happy with the results.  A major company I had some
association with later claimed to have solved the problem, but would
never back it up, even though they promised to.

I recently acquired a book 'Independent Component Analysis' by
Hyvarinen et al (2001) that early in the book talks about speaker
separation as an application of ICA but in the end says the problem is
'largely unsolved' (the data doesn't fit the model).  The last time I
checked the literature in detail (2002) there did not seem to be much
accomplished outside of use in carefully contrived situations.

BTW naebad, the transfer functions you refer to are largely time
varying.

Lars, where do you find MATLAB code to do this?  I would guess you
don't. If you do find something that works well over a range of
real-world applications and situations with a single input, I think you
will be a rich man.

Dirk

I think you are confusing two things. When c1 and c2 are constants this
case is not reaslistic and that's why it's east to solve. When c1 and
c2 are z-TFs and as you point out are time-varying then the problem is
realistic and not easy to solve. The beamforming solution is best in my
opinion, the more mics the merrier.If you have one sensor only then
forget it. ICA works well when c1 and c2 are just constants but as I
pointed out this is not teh real world.

Naebad

naebad,

It is my impression (I claim no expertise in ICA) that for ICA to work
well with two constants C1 and C2 used to mix two original signals, two
separate mixes of the two original signals would be required to
process, rather than the one signal mix given in the OP. Please educate
me if I am mistaken.

I am also curious about how the original simplified problem is 'easily
done' in general. I am aware of several approaches that have been
tried, but none that succeeded in general.

Thanks,

Dirk

Hi again

I was wondering:

Say that we have a random variable Y(t)=a*S1(t)+b*S2(t) where a,b are 
unknown constants and S1(t) and S2(t) are correlated random variables.

Is it possible to calculate the probability P(S1(t)=s1) given Y(t) ??

If so , how?



Hi,

I am sorry that I am not an ICA person but i have seen a demo where C1
and C2 are constants and it  worked perfectly using ICA. If you leave
your email I will put you in touch with the Prof that did this.
Different story when C1 and C2 were time-varying z-transfer functions
though. My own experience is with beamforming and I get about 12 dB
reduction in 'noise' (noise being the unwanted speech) using 2 mics in
a real reverberent environment. I am publishing this next year at a
conference. I cannot separate the two but I am only interested in the
signal - not the 'noise'. The only time when you might consider C1 and
C2 to be constants are when you are in an anechoic chamber - and even
then they would be pure delays. In a reverberant environment they would
be non-min phase transfer functions + pure time-delays.

Naebad