DSPRelated.com
Forums

speech separation

Started by Matt Roos January 6, 2004
What are some common algorithms for the separation of simultaneous speech?
I'd appreciate any references you can provide.  I'm interested in
simultaneous speech (the cocktail party problem) as well as a similar one...
a recorded conversation between two speakers.  In both cases there are two
sources and two microphones.  The only difference is one is a conversation
while in the other both speakers may be speaking simultaneously much of the
time.

I'm interested in actual implementation on a modern PC with near real-time
processing.

One of the algorithms I've come across is based on kurtosis.  Has anyone
tried this?

Thanks,
Matt

-- 
Remove Xs from address to reply via e-mail.


Hello Matt san,

problem appear to be interesting.  Problem is

  one channel input ---> [   deconvolution  module] ----> many channel
outputs

If the above is correct, then we need to get SIMO ( Single input and
multi output) transfer function.

May eb, we can try Multi channel system identification ( based on Pade
, Prony, apprxomation..i used to during my P.hd days....) to get
transfer function.

More might come out, if we keep thining on this problem.

all the best.

Kind Regards
jk
jk@epigon.co.in


"Matt Roos" <XXmatt.roos@verizon.net> wrote in message news:<mrHKb.23311$R_4.19438@nwrddc03.gnilink.net>...
> What are some common algorithms for the separation of simultaneous speech? > I'd appreciate any references you can provide. I'm interested in > simultaneous speech (the cocktail party problem) as well as a similar one... > a recorded conversation between two speakers. In both cases there are two > sources and two microphones. The only difference is one is a conversation > while in the other both speakers may be speaking simultaneously much of the > time. > > I'm interested in actual implementation on a modern PC with near real-time > processing. > > One of the algorithms I've come across is based on kurtosis. Has anyone > tried this? > > Thanks, > Matt
I think the original problem sounds more like a MIMO source separation
(SS) problem.  Assuming you don't have access to the source, it becomes
a blind SS problem.  Besides using kurtosis, which is a higher-
order statistics (HOS), there are many other ways of working out the
problem.  Besides HOS requires large amount of received samples
at the microphone to work (larger than, say, second-order statistics
methods).  You can also treat such a problem as an beamforming
problem.  You can start off by looking at the book by A. Hyvarinen
on Independent Component Analysis, 2001.

hth,

cf

jk wrote:
> > Hello Matt san, > > problem appear to be interesting. Problem is > > one channel input ---> [ deconvolution module] ----> many channel > outputs > > If the above is correct, then we need to get SIMO ( Single input and > multi output) transfer function. > > May eb, we can try Multi channel system identification ( based on Pade > , Prony, apprxomation..i used to during my P.hd days....) to get > transfer function. > > More might come out, if we keep thining on this problem. > > all the best. > > Kind Regards > jk > jk@epigon.co.in > > "Matt Roos" <XXmatt.roos@verizon.net> wrote in message news:<mrHKb.23311$R_4.19438@nwrddc03.gnilink.net>... > > What are some common algorithms for the separation of simultaneous speech? > > I'd appreciate any references you can provide. I'm interested in > > simultaneous speech (the cocktail party problem) as well as a similar one... > > a recorded conversation between two speakers. In both cases there are two > > sources and two microphones. The only difference is one is a conversation > > while in the other both speakers may be speaking simultaneously much of the > > time. > > > > I'm interested in actual implementation on a modern PC with near real-time > > processing. > > > > One of the algorithms I've come across is based on kurtosis. Has anyone > > tried this? > > > > Thanks, > > Matt
Can we set a model that can recognise certain person?
Like Neuron network?








"Matt Roos" <XXmatt.roos@verizon.net> &#1076;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#994;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
:mrHKb.23311$R_4.19438@nwrddc03.gnilink.net...
> What are some common algorithms for the separation of simultaneous speech? > I'd appreciate any references you can provide. I'm interested in > simultaneous speech (the cocktail party problem) as well as a similar
one...
> a recorded conversation between two speakers. In both cases there are two > sources and two microphones. The only difference is one is a conversation > while in the other both speakers may be speaking simultaneously much of
the
> time. > > I'm interested in actual implementation on a modern PC with near real-time > processing. > > One of the algorithms I've come across is based on kurtosis. Has anyone > tried this? > > Thanks, > Matt > > -- > Remove Xs from address to reply via e-mail. > >
thanks for all the comments so far.  to clarify/summarize a few things...

this is a multi-input multi-output (MIMO) problem.  in particular, i'm
currently interested in two inputs, two outputs.

unfortunately i cannot train it on any individual voice.  it must remain
generic for any speaker (of any language as well).

thanks again for the comments!

matt

"Matt Roos" <XXmatt.roos@verizon.net> wrote in message
news:mrHKb.23311$R_4.19438@nwrddc03.gnilink.net...
> What are some common algorithms for the separation of simultaneous speech? > I'd appreciate any references you can provide. I'm interested in > simultaneous speech (the cocktail party problem) as well as a similar
one...
> a recorded conversation between two speakers. In both cases there are two > sources and two microphones. The only difference is one is a conversation > while in the other both speakers may be speaking simultaneously much of
the
> time. > > I'm interested in actual implementation on a modern PC with near real-time > processing. > > One of the algorithms I've come across is based on kurtosis. Has anyone > tried this? > > Thanks, > Matt > > -- > Remove Xs from address to reply via e-mail. > >
Hello Matt,

> this is a multi-input multi-output (MIMO) problem. in particular, i'm > currently interested in two inputs, two outputs. >
No ( i suppose). Problem is SIMO. This is becasue ( both mic inputs are added and become one signal). If you have both the mic outputs available to you then the problem is not difficult becasue 2 separate channels are available. Kind Regards jk Matt Roos" <XXmatt.roos@verizon.net> wrote in message news:<HEVKb.26914$R_4.13872@nwrddc03.gnilink.net>...
> thanks for all the comments so far. to clarify/summarize a few things... > > this is a multi-input multi-output (MIMO) problem. in particular, i'm > currently interested in two inputs, two outputs. > > unfortunately i cannot train it on any individual voice. it must remain > generic for any speaker (of any language as well). > > thanks again for the comments! > > matt > > "Matt Roos" <XXmatt.roos@verizon.net> wrote in message > news:mrHKb.23311$R_4.19438@nwrddc03.gnilink.net... > > What are some common algorithms for the separation of simultaneous speech? > > I'd appreciate any references you can provide. I'm interested in > > simultaneous speech (the cocktail party problem) as well as a similar > one... > > a recorded conversation between two speakers. In both cases there are two > > sources and two microphones. The only difference is one is a conversation > > while in the other both speakers may be speaking simultaneously much of > the > > time. > > > > I'm interested in actual implementation on a modern PC with near real-time > > processing. > > > > One of the algorithms I've come across is based on kurtosis. Has anyone > > tried this? > > > > Thanks, > > Matt > > > > -- > > Remove Xs from address to reply via e-mail. > > > >

Matt Roos wrote:

> What are some common algorithms for the separation of simultaneous speech? > I'd appreciate any references you can provide. I'm interested in > simultaneous speech (the cocktail party problem) as well as a similar one... > a recorded conversation between two speakers. In both cases there are two > sources and two microphones. The only difference is one is a conversation > while in the other both speakers may be speaking simultaneously much of the > time. > > I'm interested in actual implementation on a modern PC with near real-time > processing. > > One of the algorithms I've come across is based on kurtosis. Has anyone > tried this? > > Thanks, > Matt > > -- > Remove Xs from address to reply via e-mail.
You will not be able to separate both voices -only attenuate one of them by maybe up to 6dB being optimistic. Try an acoustic beamformer (2 input). Can you post somewhere the data (a short section) - I may have a go myself! How far apart are the microphones and what sort of environment was it recorded in? (ie reverberant or anechoic) Tom
no, it is indeed MIMO.  the inputs to the mikes are not identical as the
positions are different and the relative positions of the speakers are
different.  speaker 1 may be closer to mic 1, but some of speaker 2's voice
will get in there as well.  the reverse would be true for mic 2.

two inputs (the two speakers)
two outputs (the two microphone outputs)

Matt

"jk" <jk@epigon.co.in> wrote in message
news:5dd083c0.0401071203.77748b4f@posting.google.com...
> Hello Matt, > > > this is a multi-input multi-output (MIMO) problem. in particular, i'm > > currently interested in two inputs, two outputs. > > > > No ( i suppose). > Problem is SIMO. This is becasue ( both mic inputs are added and > become one signal). If you have both the mic outputs available to you > then the problem is not difficult becasue 2 separate channels are > available. > > Kind Regards > jk > > > Matt Roos" <XXmatt.roos@verizon.net> wrote in message
news:<HEVKb.26914$R_4.13872@nwrddc03.gnilink.net>...
> > thanks for all the comments so far. to clarify/summarize a few
things...
> > > > this is a multi-input multi-output (MIMO) problem. in particular, i'm > > currently interested in two inputs, two outputs. > > > > unfortunately i cannot train it on any individual voice. it must remain > > generic for any speaker (of any language as well). > > > > thanks again for the comments! > > > > matt > > > > "Matt Roos" <XXmatt.roos@verizon.net> wrote in message > > news:mrHKb.23311$R_4.19438@nwrddc03.gnilink.net... > > > What are some common algorithms for the separation of simultaneous
speech?
> > > I'd appreciate any references you can provide. I'm interested in > > > simultaneous speech (the cocktail party problem) as well as a similar > > one... > > > a recorded conversation between two speakers. In both cases there are
two
> > > sources and two microphones. The only difference is one is a
conversation
> > > while in the other both speakers may be speaking simultaneously much
of
> > the > > > time. > > > > > > I'm interested in actual implementation on a modern PC with near
real-time
> > > processing. > > > > > > One of the algorithms I've come across is based on kurtosis. Has
anyone
> > > tried this? > > > > > > Thanks, > > > Matt > > > > > > -- > > > Remove Xs from address to reply via e-mail. > > > > > >
i REALLY wish i had data to give.  i'm in the undesirable position of
putting together a rudimentary algorithm without any decent test data.  i've
just had to synthesize it by adding together some segments i've gotten off
the internet.

for anyone interested, i got some pretty good results last night using a
decorrelation algorithm as proposed by Weinstein [1993, Trans on Speech and
Audio Processing].  It's pretty simple and I assumed the two signals are
coupled together only using constant gains to further simplify
implementation (and which is exactly how i sythesized it).  Maybe not all
that interesting and effective in other cases, but not a terrible start.

if time allows, i may try the "natural gradient convolutive blind source
separation" technique proposed by Scott Douglas.  seems interesting.

Matt

"Tom" <somebody@nOpam.com> wrote in message
news:3FFC7B4A.24AE48B6@nOpam.com...
> > > Matt Roos wrote: > > > What are some common algorithms for the separation of simultaneous
speech?
> > I'd appreciate any references you can provide. I'm interested in > > simultaneous speech (the cocktail party problem) as well as a similar
one...
> > a recorded conversation between two speakers. In both cases there are
two
> > sources and two microphones. The only difference is one is a
conversation
> > while in the other both speakers may be speaking simultaneously much of
the
> > time. > > > > I'm interested in actual implementation on a modern PC with near
real-time
> > processing. > > > > One of the algorithms I've come across is based on kurtosis. Has anyone > > tried this? > > > > Thanks, > > Matt > > > > -- > > Remove Xs from address to reply via e-mail. > > You will not be able to separate both voices -only attenuate one of them
by
> maybe up to 6dB being optimistic. > Try an acoustic beamformer (2 input). Can you post somewhere the data (a
short
> section) - I may have a go myself! How far apart are the microphones and
what
> sort of environment was it recorded in? (ie reverberant or anechoic) > > Tom > > >

Matt Roos wrote:

> i REALLY wish i had data to give. i'm in the undesirable position of > putting together a rudimentary algorithm without any decent test data. i've > just had to synthesize it by adding together some segments i've gotten off > the internet. > > for anyone interested, i got some pretty good results last night using a > decorrelation algorithm as proposed by Weinstein [1993, Trans on Speech and > Audio Processing]. It's pretty simple and I assumed the two signals are > coupled together only using constant gains to further simplify >
The constant gain coupling case is quite easy to solve. You need to put transfer functions in where the gains were. The transfer functions (FIR) can also be non-minimum phase. Tom