What are some common algorithms for the separation of simultaneous speech? I'd appreciate any references you can provide. I'm interested in simultaneous speech (the cocktail party problem) as well as a similar one... a recorded conversation between two speakers. In both cases there are two sources and two microphones. The only difference is one is a conversation while in the other both speakers may be speaking simultaneously much of the time. I'm interested in actual implementation on a modern PC with near real-time processing. One of the algorithms I've come across is based on kurtosis. Has anyone tried this? Thanks, Matt -- Remove Xs from address to reply via e-mail.
speech separation
Started by ●January 6, 2004
Reply by ●January 7, 20042004-01-07
Hello Matt san, problem appear to be interesting. Problem is one channel input ---> [ deconvolution module] ----> many channel outputs If the above is correct, then we need to get SIMO ( Single input and multi output) transfer function. May eb, we can try Multi channel system identification ( based on Pade , Prony, apprxomation..i used to during my P.hd days....) to get transfer function. More might come out, if we keep thining on this problem. all the best. Kind Regards jk jk@epigon.co.in "Matt Roos" <XXmatt.roos@verizon.net> wrote in message news:<mrHKb.23311$R_4.19438@nwrddc03.gnilink.net>...> What are some common algorithms for the separation of simultaneous speech? > I'd appreciate any references you can provide. I'm interested in > simultaneous speech (the cocktail party problem) as well as a similar one... > a recorded conversation between two speakers. In both cases there are two > sources and two microphones. The only difference is one is a conversation > while in the other both speakers may be speaking simultaneously much of the > time. > > I'm interested in actual implementation on a modern PC with near real-time > processing. > > One of the algorithms I've come across is based on kurtosis. Has anyone > tried this? > > Thanks, > Matt
Reply by ●January 7, 20042004-01-07
I think the original problem sounds more like a MIMO source separation (SS) problem. Assuming you don't have access to the source, it becomes a blind SS problem. Besides using kurtosis, which is a higher- order statistics (HOS), there are many other ways of working out the problem. Besides HOS requires large amount of received samples at the microphone to work (larger than, say, second-order statistics methods). You can also treat such a problem as an beamforming problem. You can start off by looking at the book by A. Hyvarinen on Independent Component Analysis, 2001. hth, cf jk wrote:> > Hello Matt san, > > problem appear to be interesting. Problem is > > one channel input ---> [ deconvolution module] ----> many channel > outputs > > If the above is correct, then we need to get SIMO ( Single input and > multi output) transfer function. > > May eb, we can try Multi channel system identification ( based on Pade > , Prony, apprxomation..i used to during my P.hd days....) to get > transfer function. > > More might come out, if we keep thining on this problem. > > all the best. > > Kind Regards > jk > jk@epigon.co.in > > "Matt Roos" <XXmatt.roos@verizon.net> wrote in message news:<mrHKb.23311$R_4.19438@nwrddc03.gnilink.net>... > > What are some common algorithms for the separation of simultaneous speech? > > I'd appreciate any references you can provide. I'm interested in > > simultaneous speech (the cocktail party problem) as well as a similar one... > > a recorded conversation between two speakers. In both cases there are two > > sources and two microphones. The only difference is one is a conversation > > while in the other both speakers may be speaking simultaneously much of the > > time. > > > > I'm interested in actual implementation on a modern PC with near real-time > > processing. > > > > One of the algorithms I've come across is based on kurtosis. Has anyone > > tried this? > > > > Thanks, > > Matt
Reply by ●January 7, 20042004-01-07
Can we set a model that can recognise certain person? Like Neuron network? "Matt Roos" <XXmatt.roos@verizon.net> д����Ϣ���� :mrHKb.23311$R_4.19438@nwrddc03.gnilink.net...> What are some common algorithms for the separation of simultaneous speech? > I'd appreciate any references you can provide. I'm interested in > simultaneous speech (the cocktail party problem) as well as a similarone...> a recorded conversation between two speakers. In both cases there are two > sources and two microphones. The only difference is one is a conversation > while in the other both speakers may be speaking simultaneously much ofthe> time. > > I'm interested in actual implementation on a modern PC with near real-time > processing. > > One of the algorithms I've come across is based on kurtosis. Has anyone > tried this? > > Thanks, > Matt > > -- > Remove Xs from address to reply via e-mail. > >
Reply by ●January 7, 20042004-01-07
thanks for all the comments so far. to clarify/summarize a few things... this is a multi-input multi-output (MIMO) problem. in particular, i'm currently interested in two inputs, two outputs. unfortunately i cannot train it on any individual voice. it must remain generic for any speaker (of any language as well). thanks again for the comments! matt "Matt Roos" <XXmatt.roos@verizon.net> wrote in message news:mrHKb.23311$R_4.19438@nwrddc03.gnilink.net...> What are some common algorithms for the separation of simultaneous speech? > I'd appreciate any references you can provide. I'm interested in > simultaneous speech (the cocktail party problem) as well as a similarone...> a recorded conversation between two speakers. In both cases there are two > sources and two microphones. The only difference is one is a conversation > while in the other both speakers may be speaking simultaneously much ofthe> time. > > I'm interested in actual implementation on a modern PC with near real-time > processing. > > One of the algorithms I've come across is based on kurtosis. Has anyone > tried this? > > Thanks, > Matt > > -- > Remove Xs from address to reply via e-mail. > >
Reply by ●January 7, 20042004-01-07
Hello Matt,> this is a multi-input multi-output (MIMO) problem. in particular, i'm > currently interested in two inputs, two outputs. >No ( i suppose). Problem is SIMO. This is becasue ( both mic inputs are added and become one signal). If you have both the mic outputs available to you then the problem is not difficult becasue 2 separate channels are available. Kind Regards jk Matt Roos" <XXmatt.roos@verizon.net> wrote in message news:<HEVKb.26914$R_4.13872@nwrddc03.gnilink.net>...> thanks for all the comments so far. to clarify/summarize a few things... > > this is a multi-input multi-output (MIMO) problem. in particular, i'm > currently interested in two inputs, two outputs. > > unfortunately i cannot train it on any individual voice. it must remain > generic for any speaker (of any language as well). > > thanks again for the comments! > > matt > > "Matt Roos" <XXmatt.roos@verizon.net> wrote in message > news:mrHKb.23311$R_4.19438@nwrddc03.gnilink.net... > > What are some common algorithms for the separation of simultaneous speech? > > I'd appreciate any references you can provide. I'm interested in > > simultaneous speech (the cocktail party problem) as well as a similar > one... > > a recorded conversation between two speakers. In both cases there are two > > sources and two microphones. The only difference is one is a conversation > > while in the other both speakers may be speaking simultaneously much of > the > > time. > > > > I'm interested in actual implementation on a modern PC with near real-time > > processing. > > > > One of the algorithms I've come across is based on kurtosis. Has anyone > > tried this? > > > > Thanks, > > Matt > > > > -- > > Remove Xs from address to reply via e-mail. > > > >
Reply by ●January 7, 20042004-01-07
Matt Roos wrote:> What are some common algorithms for the separation of simultaneous speech? > I'd appreciate any references you can provide. I'm interested in > simultaneous speech (the cocktail party problem) as well as a similar one... > a recorded conversation between two speakers. In both cases there are two > sources and two microphones. The only difference is one is a conversation > while in the other both speakers may be speaking simultaneously much of the > time. > > I'm interested in actual implementation on a modern PC with near real-time > processing. > > One of the algorithms I've come across is based on kurtosis. Has anyone > tried this? > > Thanks, > Matt > > -- > Remove Xs from address to reply via e-mail.You will not be able to separate both voices -only attenuate one of them by maybe up to 6dB being optimistic. Try an acoustic beamformer (2 input). Can you post somewhere the data (a short section) - I may have a go myself! How far apart are the microphones and what sort of environment was it recorded in? (ie reverberant or anechoic) Tom
Reply by ●January 7, 20042004-01-07
no, it is indeed MIMO. the inputs to the mikes are not identical as the positions are different and the relative positions of the speakers are different. speaker 1 may be closer to mic 1, but some of speaker 2's voice will get in there as well. the reverse would be true for mic 2. two inputs (the two speakers) two outputs (the two microphone outputs) Matt "jk" <jk@epigon.co.in> wrote in message news:5dd083c0.0401071203.77748b4f@posting.google.com...> Hello Matt, > > > this is a multi-input multi-output (MIMO) problem. in particular, i'm > > currently interested in two inputs, two outputs. > > > > No ( i suppose). > Problem is SIMO. This is becasue ( both mic inputs are added and > become one signal). If you have both the mic outputs available to you > then the problem is not difficult becasue 2 separate channels are > available. > > Kind Regards > jk > > > Matt Roos" <XXmatt.roos@verizon.net> wrote in messagenews:<HEVKb.26914$R_4.13872@nwrddc03.gnilink.net>...> > thanks for all the comments so far. to clarify/summarize a fewthings...> > > > this is a multi-input multi-output (MIMO) problem. in particular, i'm > > currently interested in two inputs, two outputs. > > > > unfortunately i cannot train it on any individual voice. it must remain > > generic for any speaker (of any language as well). > > > > thanks again for the comments! > > > > matt > > > > "Matt Roos" <XXmatt.roos@verizon.net> wrote in message > > news:mrHKb.23311$R_4.19438@nwrddc03.gnilink.net... > > > What are some common algorithms for the separation of simultaneousspeech?> > > I'd appreciate any references you can provide. I'm interested in > > > simultaneous speech (the cocktail party problem) as well as a similar > > one... > > > a recorded conversation between two speakers. In both cases there aretwo> > > sources and two microphones. The only difference is one is aconversation> > > while in the other both speakers may be speaking simultaneously muchof> > the > > > time. > > > > > > I'm interested in actual implementation on a modern PC with nearreal-time> > > processing. > > > > > > One of the algorithms I've come across is based on kurtosis. Hasanyone> > > tried this? > > > > > > Thanks, > > > Matt > > > > > > -- > > > Remove Xs from address to reply via e-mail. > > > > > >
Reply by ●January 8, 20042004-01-08
i REALLY wish i had data to give. i'm in the undesirable position of putting together a rudimentary algorithm without any decent test data. i've just had to synthesize it by adding together some segments i've gotten off the internet. for anyone interested, i got some pretty good results last night using a decorrelation algorithm as proposed by Weinstein [1993, Trans on Speech and Audio Processing]. It's pretty simple and I assumed the two signals are coupled together only using constant gains to further simplify implementation (and which is exactly how i sythesized it). Maybe not all that interesting and effective in other cases, but not a terrible start. if time allows, i may try the "natural gradient convolutive blind source separation" technique proposed by Scott Douglas. seems interesting. Matt "Tom" <somebody@nOpam.com> wrote in message news:3FFC7B4A.24AE48B6@nOpam.com...> > > Matt Roos wrote: > > > What are some common algorithms for the separation of simultaneousspeech?> > I'd appreciate any references you can provide. I'm interested in > > simultaneous speech (the cocktail party problem) as well as a similarone...> > a recorded conversation between two speakers. In both cases there aretwo> > sources and two microphones. The only difference is one is aconversation> > while in the other both speakers may be speaking simultaneously much ofthe> > time. > > > > I'm interested in actual implementation on a modern PC with nearreal-time> > processing. > > > > One of the algorithms I've come across is based on kurtosis. Has anyone > > tried this? > > > > Thanks, > > Matt > > > > -- > > Remove Xs from address to reply via e-mail. > > You will not be able to separate both voices -only attenuate one of themby> maybe up to 6dB being optimistic. > Try an acoustic beamformer (2 input). Can you post somewhere the data (ashort> section) - I may have a go myself! How far apart are the microphones andwhat> sort of environment was it recorded in? (ie reverberant or anechoic) > > Tom > > >
Reply by ●January 23, 20042004-01-23
Matt Roos wrote:> i REALLY wish i had data to give. i'm in the undesirable position of > putting together a rudimentary algorithm without any decent test data. i've > just had to synthesize it by adding together some segments i've gotten off > the internet. > > for anyone interested, i got some pretty good results last night using a > decorrelation algorithm as proposed by Weinstein [1993, Trans on Speech and > Audio Processing]. It's pretty simple and I assumed the two signals are > coupled together only using constant gains to further simplify >The constant gain coupling case is quite easy to solve. You need to put transfer functions in where the gains were. The transfer functions (FIR) can also be non-minimum phase. Tom






