comp.dsp | Post-doc in Acoustic Event Detection (US citizens only)| page 6

Reply by Rune Allnor ●August 26, 20062006-08-26

tony.nospam@nospam.tonyRobinson.com wrote:
> "Rune Allnor" <allnor@tele.ntnu.no> writes:
>
> > tony.nospam@nospam.tonyRobinson.com wrote:

> > > So your problem is one of coverage?   I see no claim that the target has
> > > to occur in any database.
> >
> > So how are you going to find it?
>
> Who says that you are going to?

I admit that I have slid to your level of precision; I will make an
effort of that not happening again. I used "find" as a synonym for
"detect". The ad stated "detection of..."

> I think it perfectly reasonable that
> you output a confidence that the target word appears in every database
> entry.

OK, this is ridiculous. You obviously differentiate between "find" and
"detect" so I'll join you in that game: You are going to "detect" a
target
but not "find" it... in my world thedifference between amounts to the
somewhat general statement "a target is present in the scenario" and
"the target is located at position (x,y,z)" [for the record: I use
"find" as yet
another synonym, this time for "locate"].

I don't have the slightest clue about what you mean when you defend
the ad where "detection" is a key element and you now objects to
my use of "find".

> > > Now we populate our database with sounds, it might be a parrot saying "I
> > > know what Ambiguous means!", a computer synthesising "Usenet ranting is
> > > a waste of time" or anything else.  The patterns can be any sound which
> > > convers all possible words.  Are you okay with this or would you like to
> > > argue that you'll need an infinite database to store all possible words?
> >
> > The signal can be any sound, then. Whether it is a word or anything
> > else is unspecified, what I am conserned.
>
> Yes, I've generalised to any sound so that you're not concerned with the
> compexities of some language that is not available to the researcher.
> Let's keep things simple if we can.

Ah, what is this? Is words that belong to "some language that is not
available to the researcher" by any chance "impossible" according to
the ad's explicit mentioning of "all possible words"?

Why this simplification all of a sudden?

> > > We run our matching algorithm, it comes up with a result, perhaps it's
> > > good, perhaps not.  Are you happy with this or do you expect perfect
> > > detection?
> >
> > I can't see what relevance this matching has to anything useful.
> > Any matching algorithm will produce a result on any data. Whether
> > the match makes sense or is useful, depends on whether the
> > processing algorithm is based on a representation that
> > matches the data.
>
> True enough.  I'm not arguing for any particular algorithm, just that
> it's reasonable to allow for any sound to be the target and any sound to
> be in the database as background.  This seemed to be one of your main
> objections.

My main objection is that you claim to be able to search through
"all possible words".

> > What makes you think you can find such an elusive
> > signal structure as a "word" when DSP methods that
> > have been researched extensively for 30 years can not
> > tell whether or not a signal segment really contains such
> > a well-defined structure as a monocromatic sinusoidal.
>
> It's easy to add sufficient noise to any pattern matching problem such
> that the probability of detection becomes vanishingly small.  This is as
> true of a target word in background noise as it is of a single sinusiod
> in noise.

No. Hiding words is easier. The term "sinusoidal" has a very
specific meaning and a very specific mathematical representation.
Mathematical algorithms can be designed that take advantage
of these properties, and push the limits of what can be detected.

> This just makes it a harder task and more iteresting for some
> people -

Maybe. There are people whose purpose in life is to prove the
existence of the Loch Ness monster, the Yeti or Sasquatch(sp?).

> not an irresponsibly worded advert or indication of evil
> employers out to depress new PhDs into abandoning their careers.

I am not trying to depress PhD students into abandoning their
carreers. I am using my own experiences to warn them to
watch out for tell-tale signs of bad projects.

> > > You've clearly got a problem matching one sound against all other
> > > possible sounds based on your past experience, I just don't see how that
> > > relates to this job ad.
> >
> > OK, define a "word". Does a human have to utter the sound, or is it
> > sufficient that the parrot utters it? Is everything a human utters a
> > "word"?
> > Having defined the "word" somehow, how do you separate a "word" in a
> > recording from a mixture involving other types of sounds?
>
> This is just irrelevent detail, I still don't know why you are hung up
> on it.

Because stating a defining property of whatever you are looking for
is the key to finding it. You have done every possible twist and tiurn
to avoid to come up with a definition. Maybe Sherlock Holmes can
tell Dr Watson to search an appartement for something and answer
"you will know it when you see it"  when asked what to search for.
As you hopefully know, Sherlocjk Holmes and Dr Watson are
fictionous characters. You -- and your PhD students -- live in real
life. You will have to know what to look for, in order to find it.

>  We don't know the details of the project but it's reasonable to
> assume we've got an instance of the target "word", so it's just a set of
> acoutic vectors.  We can build a model of the variation of these that we
> may reasonably see in the database.

So these "variations" constitute the elusive "all possible words"?

> We can build models of the
> background noise in the database.  We can combine both models and
> perform a match.  If you want details, one representation would be to
> use FFT poweres binned on a perceptual scale and hidden Markov models
> for the target and background.  The target and background models can be
> combined by multiplying out the states (parallel model combination) and
> assuming the target and background are uncorrelated the observations
> powers sum as do the variances.  You can then do a Viterbi alignment and
> look for the log liklihood difference between your target word occuring
> and not occurring.  I'm not saying this is the best method, just it's
> the first that comes to mind, and perhaps the one I'd start with if I
> was a contractor tasked with this, or if I was supervising a research
> student/research fellow in this area.

What you have outlined is maths. Maths work with data represented as
numbers. The DSP algorithms I know of take numbers as arguments,
perform arithmetical operations on the numbers and produce numbers
as output.

Could you please indicate how you expect to design a DSP algorithm
that produces a "word" -- a concept which has yet to be defined --  as
output?

> > Apart from that, you might find it interesting to read up on
> > Weierstrass' (sp?) representation theorem. It basically says that
> > any data sequence can be represented arbitrarily well by any
> > set of linearly independent basis functions.
>
> Sounds perfectly reasonable to me.
>
> > So basically, you can take any signal and match it to any
> > of your sounds in your library. Subtract the template sound that
> > matches best, and remove the template from the library. Then
> > match the remains of the signal against the remains of the library.
> > Some other sound will be the one that matches best. Subtract
> > this from the signal and remove the template from the library.
> > Repeat this mach - subtract - remove process until either the
> > residual == 0 or you have compared the signal  with the whole
> > library. I'll almost guarantee that you still have a non-zero residual
> > by the time you run out of templates.
> >
> > The Weierstrass theorem is a real killer for most "bright"
> > signal analysis ideas based on template matching.
>
> Well, in this case I wouldn't expect the target and database entries to
> be in phase, so simple subraction wouldn't get you anywhere.  But your
> point seems to be that you can match any target to any template to a
> certain degree

No. I can match any target waveform to any library of references and
get a match to every single one (assuming perfect arithmetics).

> and from this you draw the concluson that the task is
> impossible.

Yes, at least by the technique of template matching.

> That's not a valid step, all that we need to do is
> establish a degree of confidence that the target is embedded in the
> background, and that is certainly possible to do.

What confidence will that be? I have just argued on the basis
of the Weierstrass theorem -- which you aparently agreed to --
that an algorithm will match any library to any data you throw at it.

What can you possibly learn from such an exercise that you did
not already know?

Rune

Reply by morgan ●August 26, 20062006-08-26

Rune Allnor wrote:
> tony.nospam@nospam.tonyRobinson.com wrote:
> > "Rune Allnor" <allnor@tele.ntnu.no> writes:
> >
> > > tony.nospam@nospam.tonyRobinson.com wrote:
>
> > > > So your problem is one of coverage?   I see no claim that the target has
> > > > to occur in any database.
> > >
> > > So how are you going to find it?
> >
> > Who says that you are going to?
>
> I admit that I have slid to your level of precision; I will make an
> effort of that not happening again. I used "find" as a synonym for
> "detect". The ad stated "detection of..."
>
> > I think it perfectly reasonable that
> > you output a confidence that the target word appears in every database
> > entry.
>
> OK, this is ridiculous. You obviously differentiate between "find" and
> "detect" so I'll join you in that game: You are going to "detect" a
> target
> but not "find" it... in my world thedifference between amounts to the
> somewhat general statement "a target is present in the scenario" and
> "the target is located at position (x,y,z)" [for the record: I use
> "find" as yet
> another synonym, this time for "locate"].
>
> I don't have the slightest clue about what you mean when you defend
> the ad where "detection" is a key element and you now objects to
> my use of "find".
>
> > > > Now we populate our database with sounds, it might be a parrot saying "I
> > > > know what Ambiguous means!", a computer synthesising "Usenet ranting is
> > > > a waste of time" or anything else.  The patterns can be any sound which
> > > > convers all possible words.  Are you okay with this or would you like to
> > > > argue that you'll need an infinite database to store all possible words?
> > >
> > > The signal can be any sound, then. Whether it is a word or anything
> > > else is unspecified, what I am conserned.
> >
> > Yes, I've generalised to any sound so that you're not concerned with the
> > compexities of some language that is not available to the researcher.
> > Let's keep things simple if we can.
>
> Ah, what is this? Is words that belong to "some language that is not
> available to the researcher" by any chance "impossible" according to
> the ad's explicit mentioning of "all possible words"?
>
> Why this simplification all of a sudden?
>
> > > > We run our matching algorithm, it comes up with a result, perhaps it's
> > > > good, perhaps not.  Are you happy with this or do you expect perfect
> > > > detection?
> > >
> > > I can't see what relevance this matching has to anything useful.
> > > Any matching algorithm will produce a result on any data. Whether
> > > the match makes sense or is useful, depends on whether the
> > > processing algorithm is based on a representation that
> > > matches the data.
> >
> > True enough.  I'm not arguing for any particular algorithm, just that
> > it's reasonable to allow for any sound to be the target and any sound to
> > be in the database as background.  This seemed to be one of your main
> > objections.
>
> My main objection is that you claim to be able to search through
> "all possible words".
>
> > > What makes you think you can find such an elusive
> > > signal structure as a "word" when DSP methods that
> > > have been researched extensively for 30 years can not
> > > tell whether or not a signal segment really contains such
> > > a well-defined structure as a monocromatic sinusoidal.
> >
> > It's easy to add sufficient noise to any pattern matching problem such
> > that the probability of detection becomes vanishingly small.  This is as
> > true of a target word in background noise as it is of a single sinusiod
> > in noise.
>
> No. Hiding words is easier. The term "sinusoidal" has a very
> specific meaning and a very specific mathematical representation.
> Mathematical algorithms can be designed that take advantage
> of these properties, and push the limits of what can be detected.
>
> > This just makes it a harder task and more iteresting for some
> > people -
>
> Maybe. There are people whose purpose in life is to prove the
> existence of the Loch Ness monster, the Yeti or Sasquatch(sp?).
>
> > not an irresponsibly worded advert or indication of evil
> > employers out to depress new PhDs into abandoning their careers.
>
> I am not trying to depress PhD students into abandoning their
> carreers. I am using my own experiences to warn them to
> watch out for tell-tale signs of bad projects.
>
> > > > You've clearly got a problem matching one sound against all other
> > > > possible sounds based on your past experience, I just don't see how that
> > > > relates to this job ad.
> > >
> > > OK, define a "word". Does a human have to utter the sound, or is it
> > > sufficient that the parrot utters it? Is everything a human utters a
> > > "word"?
> > > Having defined the "word" somehow, how do you separate a "word" in a
> > > recording from a mixture involving other types of sounds?
> >
> > This is just irrelevent detail, I still don't know why you are hung up
> > on it.
>
> Because stating a defining property of whatever you are looking for
> is the key to finding it. You have done every possible twist and tiurn
> to avoid to come up with a definition. Maybe Sherlock Holmes can
> tell Dr Watson to search an appartement for something and answer
> "you will know it when you see it"  when asked what to search for.
> As you hopefully know, Sherlocjk Holmes and Dr Watson are
> fictionous characters. You -- and your PhD students -- live in real
> life. You will have to know what to look for, in order to find it.
>
> >  We don't know the details of the project but it's reasonable to
> > assume we've got an instance of the target "word", so it's just a set of
> > acoutic vectors.  We can build a model of the variation of these that we
> > may reasonably see in the database.
>
> So these "variations" constitute the elusive "all possible words"?
>
> > We can build models of the
> > background noise in the database.  We can combine both models and
> > perform a match.  If you want details, one representation would be to
> > use FFT poweres binned on a perceptual scale and hidden Markov models
> > for the target and background.  The target and background models can be
> > combined by multiplying out the states (parallel model combination) and
> > assuming the target and background are uncorrelated the observations
> > powers sum as do the variances.  You can then do a Viterbi alignment and
> > look for the log liklihood difference between your target word occuring
> > and not occurring.  I'm not saying this is the best method, just it's
> > the first that comes to mind, and perhaps the one I'd start with if I
> > was a contractor tasked with this, or if I was supervising a research
> > student/research fellow in this area.
>
> What you have outlined is maths. Maths work with data represented as
> numbers. The DSP algorithms I know of take numbers as arguments,
> perform arithmetical operations on the numbers and produce numbers
> as output.
>
> Could you please indicate how you expect to design a DSP algorithm
> that produces a "word" -- a concept which has yet to be defined --  as
> output?
>
> > > Apart from that, you might find it interesting to read up on
> > > Weierstrass' (sp?) representation theorem. It basically says that
> > > any data sequence can be represented arbitrarily well by any
> > > set of linearly independent basis functions.
> >
> > Sounds perfectly reasonable to me.
> >
> > > So basically, you can take any signal and match it to any
> > > of your sounds in your library. Subtract the template sound that
> > > matches best, and remove the template from the library. Then
> > > match the remains of the signal against the remains of the library.
> > > Some other sound will be the one that matches best. Subtract
> > > this from the signal and remove the template from the library.
> > > Repeat this mach - subtract - remove process until either the
> > > residual == 0 or you have compared the signal  with the whole
> > > library. I'll almost guarantee that you still have a non-zero residual
> > > by the time you run out of templates.
> > >
> > > The Weierstrass theorem is a real killer for most "bright"
> > > signal analysis ideas based on template matching.
> >
> > Well, in this case I wouldn't expect the target and database entries to
> > be in phase, so simple subraction wouldn't get you anywhere.  But your
> > point seems to be that you can match any target to any template to a
> > certain degree
>
> No. I can match any target waveform to any library of references and
> get a match to every single one (assuming perfect arithmetics).
>
> > and from this you draw the concluson that the task is
> > impossible.
>
> Yes, at least by the technique of template matching.
>
> > That's not a valid step, all that we need to do is
> > establish a degree of confidence that the target is embedded in the
> > background, and that is certainly possible to do.
>
> What confidence will that be? I have just argued on the basis
> of the Weierstrass theorem -- which you aparently agreed to --
> that an algorithm will match any library to any data you throw at it.
>
> What can you possibly learn from such an exercise that you did
> not already know?
> 
> Rune

Reply by morgan ●August 26, 20062006-08-26

Rune Allnor wrote:
> tony.nospam@nospam.tonyRobinson.com wrote:
> > "Rune Allnor" <allnor@tele.ntnu.no> writes:
> >
> > > tony.nospam@nospam.tonyRobinson.com wrote:
>
> > > > So your problem is one of coverage?   I see no claim that the target has
> > > > to occur in any database.
> > >
> > > So how are you going to find it?
> >
> > Who says that you are going to?
>
> I admit that I have slid to your level of precision; I will make an
> effort of that not happening again. I used "find" as a synonym for
> "detect". The ad stated "detection of..."
>
> > I think it perfectly reasonable that
> > you output a confidence that the target word appears in every database
> > entry.
>
> OK, this is ridiculous. You obviously differentiate between "find" and
> "detect" so I'll join you in that game: You are going to "detect" a
> target
> but not "find" it... in my world thedifference between amounts to the
> somewhat general statement "a target is present in the scenario" and
> "the target is located at position (x,y,z)" [for the record: I use
> "find" as yet
> another synonym, this time for "locate"].
>
> I don't have the slightest clue about what you mean when you defend
> the ad where "detection" is a key element and you now objects to
> my use of "find".
>
> > > > Now we populate our database with sounds, it might be a parrot saying "I
> > > > know what Ambiguous means!", a computer synthesising "Usenet ranting is
> > > > a waste of time" or anything else.  The patterns can be any sound which
> > > > convers all possible words.  Are you okay with this or would you like to
> > > > argue that you'll need an infinite database to store all possible words?
> > >
> > > The signal can be any sound, then. Whether it is a word or anything
> > > else is unspecified, what I am conserned.
> >
> > Yes, I've generalised to any sound so that you're not concerned with the
> > compexities of some language that is not available to the researcher.
> > Let's keep things simple if we can.
>
> Ah, what is this? Is words that belong to "some language that is not
> available to the researcher" by any chance "impossible" according to
> the ad's explicit mentioning of "all possible words"?
>
> Why this simplification all of a sudden?
>
> > > > We run our matching algorithm, it comes up with a result, perhaps it's
> > > > good, perhaps not.  Are you happy with this or do you expect perfect
> > > > detection?
> > >
> > > I can't see what relevance this matching has to anything useful.
> > > Any matching algorithm will produce a result on any data. Whether
> > > the match makes sense or is useful, depends on whether the
> > > processing algorithm is based on a representation that
> > > matches the data.
> >
> > True enough.  I'm not arguing for any particular algorithm, just that
> > it's reasonable to allow for any sound to be the target and any sound to
> > be in the database as background.  This seemed to be one of your main
> > objections.
>
> My main objection is that you claim to be able to search through
> "all possible words".
>
> > > What makes you think you can find such an elusive
> > > signal structure as a "word" when DSP methods that
> > > have been researched extensively for 30 years can not
> > > tell whether or not a signal segment really contains such
> > > a well-defined structure as a monocromatic sinusoidal.
> >
> > It's easy to add sufficient noise to any pattern matching problem such
> > that the probability of detection becomes vanishingly small.  This is as
> > true of a target word in background noise as it is of a single sinusiod
> > in noise.
>
> No. Hiding words is easier. The term "sinusoidal" has a very
> specific meaning and a very specific mathematical representation.
> Mathematical algorithms can be designed that take advantage
> of these properties, and push the limits of what can be detected.
>
> > This just makes it a harder task and more iteresting for some
> > people -
>
> Maybe. There are people whose purpose in life is to prove the
> existence of the Loch Ness monster, the Yeti or Sasquatch(sp?).
>
> > not an irresponsibly worded advert or indication of evil
> > employers out to depress new PhDs into abandoning their careers.
>
> I am not trying to depress PhD students into abandoning their
> carreers. I am using my own experiences to warn them to
> watch out for tell-tale signs of bad projects.
>
> > > > You've clearly got a problem matching one sound against all other
> > > > possible sounds based on your past experience, I just don't see how that
> > > > relates to this job ad.
> > >
> > > OK, define a "word". Does a human have to utter the sound, or is it
> > > sufficient that the parrot utters it? Is everything a human utters a
> > > "word"?
> > > Having defined the "word" somehow, how do you separate a "word" in a
> > > recording from a mixture involving other types of sounds?
> >
> > This is just irrelevent detail, I still don't know why you are hung up
> > on it.
>
> Because stating a defining property of whatever you are looking for
> is the key to finding it. You have done every possible twist and tiurn
> to avoid to come up with a definition. Maybe Sherlock Holmes can
> tell Dr Watson to search an appartement for something and answer
> "you will know it when you see it"  when asked what to search for.
> As you hopefully know, Sherlocjk Holmes and Dr Watson are
> fictionous characters. You -- and your PhD students -- live in real
> life. You will have to know what to look for, in order to find it.
>
> >  We don't know the details of the project but it's reasonable to
> > assume we've got an instance of the target "word", so it's just a set of
> > acoutic vectors.  We can build a model of the variation of these that we
> > may reasonably see in the database.
>
> So these "variations" constitute the elusive "all possible words"?
>
> > We can build models of the
> > background noise in the database.  We can combine both models and
> > perform a match.  If you want details, one representation would be to
> > use FFT poweres binned on a perceptual scale and hidden Markov models
> > for the target and background.  The target and background models can be
> > combined by multiplying out the states (parallel model combination) and
> > assuming the target and background are uncorrelated the observations
> > powers sum as do the variances.  You can then do a Viterbi alignment and
> > look for the log liklihood difference between your target word occuring
> > and not occurring.  I'm not saying this is the best method, just it's
> > the first that comes to mind, and perhaps the one I'd start with if I
> > was a contractor tasked with this, or if I was supervising a research
> > student/research fellow in this area.
>
> What you have outlined is maths. Maths work with data represented as
> numbers. The DSP algorithms I know of take numbers as arguments,
> perform arithmetical operations on the numbers and produce numbers
> as output.
>
> Could you please indicate how you expect to design a DSP algorithm
> that produces a "word" -- a concept which has yet to be defined --  as
> output?
>
> > > Apart from that, you might find it interesting to read up on
> > > Weierstrass' (sp?) representation theorem. It basically says that
> > > any data sequence can be represented arbitrarily well by any
> > > set of linearly independent basis functions.
> >
> > Sounds perfectly reasonable to me.
> >
> > > So basically, you can take any signal and match it to any
> > > of your sounds in your library. Subtract the template sound that
> > > matches best, and remove the template from the library. Then
> > > match the remains of the signal against the remains of the library.
> > > Some other sound will be the one that matches best. Subtract
> > > this from the signal and remove the template from the library.
> > > Repeat this mach - subtract - remove process until either the
> > > residual == 0 or you have compared the signal  with the whole
> > > library. I'll almost guarantee that you still have a non-zero residual
> > > by the time you run out of templates.
> > >
> > > The Weierstrass theorem is a real killer for most "bright"
> > > signal analysis ideas based on template matching.
> >
> > Well, in this case I wouldn't expect the target and database entries to
> > be in phase, so simple subraction wouldn't get you anywhere.  But your
> > point seems to be that you can match any target to any template to a
> > certain degree
>
> No. I can match any target waveform to any library of references and
> get a match to every single one (assuming perfect arithmetics).
>
> > and from this you draw the concluson that the task is
> > impossible.
>
> Yes, at least by the technique of template matching.
>
> > That's not a valid step, all that we need to do is
> > establish a degree of confidence that the target is embedded in the
> > background, and that is certainly possible to do.
>
> What confidence will that be? I have just argued on the basis
> of the Weierstrass theorem -- which you aparently agreed to --
> that an algorithm will match any library to any data you throw at it.
>
> What can you possibly learn from such an exercise that you did
> not already know?
>
> Rune

My colleague Dave Gelbart kindly posted this apparently controversial
ad for me, and Tony Robinson has stuck up for me; but it's probably a
good idea for me to interject a few things myself. Tony knows both ICSI
and my work very well, so he's a good defender, but he's working
with the disadvantage of not knowing about this specific project. Dave
works for ICSI and knows the general idea, but I haven't expended
much effort explaining this project to him either. Maybe this posting
can clarify things.

First off, mea culpa for not specifying what our research goals are
more clearly. Sorry if this led to some unhappy speculations. The
project was absolutely underspecified [in the ad] since it wasn't a
technical white paper but just an ad to attract curious interest. (It
certainly did that, but maybe not in the way I had intended). The ad is
for graduating students, to pique their interest, and anyone who
inquired would get a fuller explanation of the project. I didn't use
potentially clarifying jargon (like wordspotting) since I was trying to
reach out beyond the usual group who would know a lot about this kind
of research; also I wanted to include nonword targets like laughter.

The white paper and proposal submitted to the sponsor said just what we
intended to do (I assure you I didn't try to snow them), which was
certainly NOT to model every possible interference, but just to improve
robustness to the normal and natural situation of having input which is
not the desired item. I'll expand on that below:

1) Project goals: in any real speech recognition or wordspotting
application, you get inputs which are not in your anticipated
inventory. That is sometimes called the OOV (out-of-vocabulary)
problem, and more generally refers to "rejection", namely the
rejection of any input that is not what you are trying to recognize or
detect. This is what you really face in the real world, and is an
interesting research problem as well as a practical one for companies.
Some of the difficulties you guys have pointed out are precisely why
research is needed. I was stating it in a very broad way, but basically
the problem is that you have an inventory of models, but you know that
input will come that is not in your inventory - what do you do? Every
system does something for this, and we are trying to make progress on
it. We are making no promise to the sponsor except that we will work on
it, using approaches described below.

2) Project methods: I didn't expand on this (or on much of anything
technical in the ad), but we are trying to learn from human
performance. We have had some good successes (where "success" in
our research means that we improved some reasonable metric, not that we
made everything perfect) using some other human-inspired signal
processing, but in this case we are going a bit further by taking
advantage of models that have been developed at the University of
Maryland from cortical measurements.

3) Point about matching a model to data - of course you guys are
right that if you have a hundred models and you get new data that is
not really corresponding to one of your models, that you will get some
level of match to all of them. But what is commonly done is that you
require some goodness of fit or else you revert to the null hypothesis
that it doesn't fit one of your models. This isn't science fiction
- many many working systems do something like this. And since you
don't know what the non-fitting input is, it really could be
anything. But you're right that this is hard to do, which is why
it's research. Some years ago Apple tried to do a live demo of a
speech recognizer called "Casper" on national TV, and ran into
these limitations. The mode of operation was to say, "Casper: pull up
a window." Or "Casper, delete this file." Etc. The recognizer was
always on, but it didn't initiate any operations until it heard
"Casper", no matter what was happening acoustically. It ended up
being a pretty embarrassing demo, since while Scully was talking about
how great it was, it misrecognized something else he said as
"Casper", and began doing all sorts of unwanted operations while
the screen was still visible to the TV audience. So the detection of
"Casper" while some unknown words or other sounds (laughter of
audience, etc) was absolutely imperfect. What you want to do in such
research problems is reduce the number of false detections and also
reduce the number of misses. And that's all we hope to do.

4) What we expect from postdocs - as with our PhD students, we want
to give them challenging problems and work with them to make progress.
It's great when groundbreaking revolutionary things happen, but we
don't expect that - we expect good ideas, hard work, and fun making
progress on a tough problem.

Anyway, sorry if the vagueness of my minimal technical description made
it sound like I was promising the moon, but in my own mind what I was
referring to was the real problem of speech recognition (although more
specifically, word spotting); you just don't know what will be coming
into the microphone, even if you think you do, and that's what you
have to contend with.

I hope this clarifies things. I haven't responded to a usenet posting
in about a quarter century (I've been busy) but this seemed
worthwhile since everyone was guessing what I meant (and, perhaps
justifiably, condemning the fact that I hadn't made it clear).

Reply by Tony Robinson ●August 26, 20062006-08-26

Jerry Avins <jya@ieee.org> writes:

> Tony Robinson wrote:
> 
>    ...
> 
> > As I said earlier on, I don't see why it's so impossible to consider a
> > distribution over all possible sounds.  I'm by no means advocating that
> > you can give all possible sounds equal weighting, I don't think that
> > defines a meanful distribution.  But there are many possible
> > distributions you can consider - for example it's not impossible for
> > many people to carry around a sound recoder 24 hours a day for say a
> > year and then use that.
> 
> Ia far as I can see, it is not possible to catalog all possible sounds,
> to say nothing about testing them.

Given a p.d.f. over length the number bits in 16bit 16kHz wav files is
finite.  We can say much more that that - we can say that they are words
or realistic sounds.  This space can be paramerterised and modelled.
I'm not saying that you can catalog it or test against every
possibility, but you can certainly evaluate against a model of the space.

> > There's certainly a lot of cynicism and distrust here, I'm sure that
> > this is unfounded with respect to this job ad.
> 
> No distrust here, only sadness and disgust that supposedly educated
> people so often fail to avoid ambiguity in what they write even after
> careful consideration of the text. When the reader is left to fill in
> gaps or discard untenable interpretations, it becomes very unlikely that
> writer and reader will see all details in the same light.

Okay Jerry, I'll ask you what you personally find ambiguous.  Rune
clearly has his own axe to grind based on his life history but I'm
trying get to the core of what you and he find so objectionable about
the phrase "the detection of one particular target word and/or sound in
the background of all other possible words or any other realistic
sounds".

Tony

Reply by Tony Robinson ●August 26, 20062006-08-26

Okay Rune, I'll let you rant on.  I've seen your name on comp.dsp many
times and it was my (apparently mistaken) belief that your frequency of
posting held up some respect.  I've put very many hours into this and it
just seems to me that you have no coherent argument and just want to
pick a fight.  I don't think the rest of the world wants to listen to
you putting up straw men and as Morgan has posted and I'm clearly
encouraging you, so I'll shut up.

*plonk as they used to say.

Reply by Jerry Avins ●August 26, 20062006-08-26

Tony Robinson wrote:

   ...

> Okay Jerry, I'll ask you what you personally find ambiguous.  Rune
> clearly has his own axe to grind based on his life history but I'm
> trying get to the core of what you and he find so objectionable about
> the phrase "the detection of one particular target word and/or sound in
> the background of all other possible words or any other realistic
> sounds".

Don't you think that "a wide variety of" is more accurate than "all 
possible"? Have we become so accustomed to hyperbole that we can't 
recognize it when it's pointed out? Would you think a algorithm's proof 
could reasonably include testing with "all possible integers"? How is 
that different from "all possible words"?

My standard for serious editing -- a goal, not an achievement -- is to 
so cast each sentence that a malicious nit-picker can't misconstrue it. 
In these informal discussions, I fall far short of technical writing, 
and even that needs improving. Nevertheless, I distinguish between "all 
x is not y" and "not all x is y", and "I only want one cup of coffee" 
and I want only one cup of coffee". It's a national shame that most 
scholars don't.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by jim ●August 26, 20062006-08-26


Jerry Avins wrote:
 Would you think a algorithm's proof
> could reasonably include testing with "all possible integers"?

I don't know why not. Name an integer that won't work with my algorithm.

> How is
> that different from "all possible words"?
> =

> My standard for serious editing -- a goal, not an achievement -- is to
> so cast each sentence that a malicious nit-picker can't misconstrue it.=


Yeah right. I think you would pick even a nit's nit. This is so beyond
nit picking it's ridiculous. =


However in advertizing they say even bad publicity is good, so I guess
the OP can't really complain about all the attention his ad is getting
no matter how unfair the criticism might be. =


-jim


> In these informal discussions, I fall far short of technical writing,
> and even that needs improving. Nevertheless, I distinguish between "all=

> x is not y" and "not all x is y", and "I only want one cup of coffee"
> and I want only one cup of coffee". It's a national shame that most
> scholars don't.
> =

> Jerry
> --
> Engineering is the art of making what you want from things you can get.=

> =AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=
=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=
=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF

----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----

Reply by Steve Underwood ●August 26, 20062006-08-26

Tony Robinson wrote:
> Jerry Avins <jya@ieee.org> writes:
> 
> 
>>Tony Robinson wrote:
>>
>>   ...
>>
>>
>>>As I said earlier on, I don't see why it's so impossible to consider a
>>>distribution over all possible sounds.  I'm by no means advocating that
>>>you can give all possible sounds equal weighting, I don't think that
>>>defines a meanful distribution.  But there are many possible
>>>distributions you can consider - for example it's not impossible for
>>>many people to carry around a sound recoder 24 hours a day for say a
>>>year and then use that.
>>
>>Ia far as I can see, it is not possible to catalog all possible sounds,
>>to say nothing about testing them.
> 
> 
> Given a p.d.f. over length the number bits in 16bit 16kHz wav files is
> finite.  We can say much more that that - we can say that they are words
> or realistic sounds.  This space can be paramerterised and modelled.
> I'm not saying that you can catalog it or test against every
> possibility, but you can certainly evaluate against a model of the space.
> 
> 
>>>There's certainly a lot of cynicism and distrust here, I'm sure that
>>>this is unfounded with respect to this job ad.
>>
>>No distrust here, only sadness and disgust that supposedly educated
>>people so often fail to avoid ambiguity in what they write even after
>>careful consideration of the text. When the reader is left to fill in
>>gaps or discard untenable interpretations, it becomes very unlikely that
>>writer and reader will see all details in the same light.
> 
> 
> Okay Jerry, I'll ask you what you personally find ambiguous.  Rune
> clearly has his own axe to grind based on his life history but I'm
> trying get to the core of what you and he find so objectionable about
> the phrase "the detection of one particular target word and/or sound in
> the background of all other possible words or any other realistic
> sounds".
> 

Puzzling isn't it? The ad specifies a research goal. Typically in signal 
processing those are not things you either meet or fail to meet, and 
everyone here has worked wih that their whole career. Goals are usually 
elastic - e.g. "The radio techique we researched turned out to work 
pretty well in a variety of circumstances. Very competitive with current 
alternatives, and cheaper. I think we have a winner, unless....". In the 
ad, the research goal is to detect a sound amongst arbitrary ones. Only 
a fool would judge the outcome on a simple pass/fail basis. The sane 
outcome will be for it to be judged elastically, based on the rate of 
false alarms and failures to alarm.

Seems like more people than Rune are having a bad week. :-)

Regards,
Steve

Reply by John Fry ●August 27, 20062006-08-27

Jerry Avins <jya@ieee.org> writes:

> Nevertheless, I distinguish between "all x is not y" and "not all x
> is y", and "I only want one cup of coffee" and I want only one cup
> of coffee". It's a national shame that most scholars don't.

Quantifier scope depends on context.  That's how language works.  No
shame in that.

Best,

John

Reply by Rune Allnor ●August 27, 20062006-08-27

morgan wrote:
> My colleague Dave Gelbart kindly posted this apparently controversial
> ad for me, and Tony Robinson has stuck up for me; but it's probably a
> good idea for me to interject a few things myself.
 [-- snip --]

Morgan,

Your post certainly gives a somewhat different impression of your
group than the ad did.

Language is a funny thing. It is the only means people have to
communicate in any elaborate way with each other.  Most people
don't read other people's minds, so they can only relate to others
through what they hear or read. The choise of words and phrasings
tend to have an impression on the listener or reader; people usually
have no choise but to take what they hear or write at face value.
Anything else would amount to guess work, second-guessing and
so on.

If you really want to attract new people, use terms that professional
generalists can understand. This ad would not be aimed at the
arbitrary John Doe, so technical terms in general are permitted.

Failing to understand the importance of presice language would
lead to a rapid decline of your group. Probably not while you
are in charge, but you might risk getting a successor who have
not contemplated the lingo he or she hears in the halls, and that
acts on the general, roundabout terms as if they were to be taken
literally.

If -- when -- that happens, all hell breaks loose. Believe me.

You built that group. You set the standards. Now you know the risk.

Rune

Previous 4 567 Next

Post-doc in Acoustic Event Detection (US citizens only)

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group