Hi I am looking for some online litterature about the following: How do I estimate the speech signals s1 and s2 based on the observation: y[k]=c1*s1[k]+c2*s2[k] where c1 and c2 are constants. I have searched for some matlab algorithms about under-complete source separation (when there are more sources than observations) but I couldn't find anything.. Hope you guys can help ? Thanks...
Question about source separation
Started by ●November 9, 2005
Reply by ●November 9, 20052005-11-09
Lars Hansen wrote:> Hi > > I am looking for some online litterature about the following: > > How do I estimate the speech signals s1 and s2 based on the observation: > > y[k]=c1*s1[k]+c2*s2[k] > > where c1 and c2 are constants. > > I have searched for some matlab algorithms about under-complete source > separation (when there are more sources than observations) but I couldn't > find anything.. > > Hope you guys can help ? > > Thanks... > > >The general problem is ill posed. Maybe this will help: A signal subspace approach for speech enhancement Ephraim, Y.; Van Trees, H.L.; Speech and Audio Processing, IEEE Transactions on Volume 3, Issue 4, July 1995 Page(s):251 - 266
Reply by ●November 9, 20052005-11-09
Lars,
Search internet for
cochannel speaker separation
or
cochannel voice separation
Good Luck,
Dirk
Lars Hansen wrote:
> Hi
>
> I am looking for some online litterature about the following:
>
> How do I estimate the speech signals s1 and s2 based on the observation:
>
> y[k]=c1*s1[k]+c2*s2[k]
>
> where c1 and c2 are constants.
>
> I have searched for some matlab algorithms about under-complete source
> separation (when there are more sources than observations) but I couldn't
> find anything..
>
> Hope you guys can help ?
>
> Thanks...
Reply by ●November 9, 20052005-11-09
That's actually not too difficult. Normally c1 and c2 are z-transfer functions with delays etc. They are non-min phase of course. Naebad
Reply by ●November 9, 20052005-11-09
ok...where do I find the algorithm that solves this problem? (in matlab plz) -- thanks "naebad" <minnaebad@yahoo.co.uk> skrev i en meddelelse news:1131568578.084152.48070@o13g2000cwo.googlegroups.com...> That's actually not too difficult. Normally c1 and c2 are z-transfer > functions with delays etc. They are non-min phase of course. > > Naebad >
Reply by ●November 9, 20052005-11-09
naebad, 'not too difficult' is not true in the applications I have seen: a) multiple people engaged in multiple conversations in approximately the same location (bar, restaurant), when you only want to hear one of them or maybe several of them separately but only have a single microphone or single channel recording. b) multiple people using the same communications channel for possibly different conversations with a distant sensor picking them both up on the same channel. c) recordings of telepone calls with people simultaneously talking in the background. Unfortunately what we see often on 'CSI: Miami' is pure fantasy. People have been working on this problem at least as far back as the early 80's (probably longer). I followed what was being done in certain areas from then until about 1992; little was accomplished. There was early work done using pitch estimation of both speakers, pitch tracking for each speaker (pitch tracks normally cross BTW),spectral separation and then reconstruction. Outside of a lab users did not seem happy with the results. A major company I had some association with later claimed to have solved the problem, but would never back it up, even though they promised to. I recently acquired a book 'Independent Component Analysis' by Hyvarinen et al (2001) that early in the book talks about speaker separation as an application of ICA but in the end says the problem is 'largely unsolved' (the data doesn't fit the model). The last time I checked the literature in detail (2002) there did not seem to be much accomplished outside of use in carefully contrived situations. BTW naebad, the transfer functions you refer to are largely time varying. Lars, where do you find MATLAB code to do this? I would guess you don't. If you do find something that works well over a range of real-world applications and situations with a single input, I think you will be a rich man. Dirk
Reply by ●November 9, 20052005-11-09
I think you are confusing two things. When c1 and c2 are constants this case is not reaslistic and that's why it's east to solve. When c1 and c2 are z-TFs and as you point out are time-varying then the problem is realistic and not easy to solve. The beamforming solution is best in my opinion, the more mics the merrier.If you have one sensor only then forget it. ICA works well when c1 and c2 are just constants but as I pointed out this is not teh real world. Naebad
Reply by ●November 10, 20052005-11-10
naebad, It is my impression (I claim no expertise in ICA) that for ICA to work well with two constants C1 and C2 used to mix two original signals, two separate mixes of the two original signals would be required to process, rather than the one signal mix given in the OP. Please educate me if I am mistaken. I am also curious about how the original simplified problem is 'easily done' in general. I am aware of several approaches that have been tried, but none that succeeded in general. Thanks, Dirk
Reply by ●November 10, 20052005-11-10
Hi again I was wondering: Say that we have a random variable Y(t)=a*S1(t)+b*S2(t) where a,b are unknown constants and S1(t) and S2(t) are correlated random variables. Is it possible to calculate the probability P(S1(t)=s1) given Y(t) ?? If so , how?
Reply by ●November 10, 20052005-11-10
Hi, I am sorry that I am not an ICA person but i have seen a demo where C1 and C2 are constants and it worked perfectly using ICA. If you leave your email I will put you in touch with the Prof that did this. Different story when C1 and C2 were time-varying z-transfer functions though. My own experience is with beamforming and I get about 12 dB reduction in 'noise' (noise being the unwanted speech) using 2 mics in a real reverberent environment. I am publishing this next year at a conference. I cannot separate the two but I am only interested in the signal - not the 'noise'. The only time when you might consider C1 and C2 to be constants are when you are in an anechoic chamber - and even then they would be pure delays. In a reverberant environment they would be non-min phase transfer functions + pure time-delays. Naebad






