DSPRelated.com
Forums

Post-doc in Acoustic Event Detection (US citizens only)

Started by Unknown August 24, 2006
tony.nospam@nospam.tonyRobinson.com wrote:
> "Rune Allnor" <allnor@tele.ntnu.no> writes: > > > tony.nospam@nospam.tonyRobinson.com wrote:
> > > So your problem is one of coverage? I see no claim that the target has > > > to occur in any database. > > > > So how are you going to find it? > > Who says that you are going to?
I admit that I have slid to your level of precision; I will make an effort of that not happening again. I used "find" as a synonym for "detect". The ad stated "detection of..."
> I think it perfectly reasonable that > you output a confidence that the target word appears in every database > entry.
OK, this is ridiculous. You obviously differentiate between "find" and "detect" so I'll join you in that game: You are going to "detect" a target but not "find" it... in my world thedifference between amounts to the somewhat general statement "a target is present in the scenario" and "the target is located at position (x,y,z)" [for the record: I use "find" as yet another synonym, this time for "locate"]. I don't have the slightest clue about what you mean when you defend the ad where "detection" is a key element and you now objects to my use of "find".
> > > Now we populate our database with sounds, it might be a parrot saying "I > > > know what Ambiguous means!", a computer synthesising "Usenet ranting is > > > a waste of time" or anything else. The patterns can be any sound which > > > convers all possible words. Are you okay with this or would you like to > > > argue that you'll need an infinite database to store all possible words? > > > > The signal can be any sound, then. Whether it is a word or anything > > else is unspecified, what I am conserned. > > Yes, I've generalised to any sound so that you're not concerned with the > compexities of some language that is not available to the researcher. > Let's keep things simple if we can.
Ah, what is this? Is words that belong to "some language that is not available to the researcher" by any chance "impossible" according to the ad's explicit mentioning of "all possible words"? Why this simplification all of a sudden?
> > > We run our matching algorithm, it comes up with a result, perhaps it's > > > good, perhaps not. Are you happy with this or do you expect perfect > > > detection? > > > > I can't see what relevance this matching has to anything useful. > > Any matching algorithm will produce a result on any data. Whether > > the match makes sense or is useful, depends on whether the > > processing algorithm is based on a representation that > > matches the data. > > True enough. I'm not arguing for any particular algorithm, just that > it's reasonable to allow for any sound to be the target and any sound to > be in the database as background. This seemed to be one of your main > objections.
My main objection is that you claim to be able to search through "all possible words".
> > What makes you think you can find such an elusive > > signal structure as a "word" when DSP methods that > > have been researched extensively for 30 years can not > > tell whether or not a signal segment really contains such > > a well-defined structure as a monocromatic sinusoidal. > > It's easy to add sufficient noise to any pattern matching problem such > that the probability of detection becomes vanishingly small. This is as > true of a target word in background noise as it is of a single sinusiod > in noise.
No. Hiding words is easier. The term "sinusoidal" has a very specific meaning and a very specific mathematical representation. Mathematical algorithms can be designed that take advantage of these properties, and push the limits of what can be detected.
> This just makes it a harder task and more iteresting for some > people -
Maybe. There are people whose purpose in life is to prove the existence of the Loch Ness monster, the Yeti or Sasquatch(sp?).
> not an irresponsibly worded advert or indication of evil > employers out to depress new PhDs into abandoning their careers.
I am not trying to depress PhD students into abandoning their carreers. I am using my own experiences to warn them to watch out for tell-tale signs of bad projects.
> > > You've clearly got a problem matching one sound against all other > > > possible sounds based on your past experience, I just don't see how that > > > relates to this job ad. > > > > OK, define a "word". Does a human have to utter the sound, or is it > > sufficient that the parrot utters it? Is everything a human utters a > > "word"? > > Having defined the "word" somehow, how do you separate a "word" in a > > recording from a mixture involving other types of sounds? > > This is just irrelevent detail, I still don't know why you are hung up > on it.
Because stating a defining property of whatever you are looking for is the key to finding it. You have done every possible twist and tiurn to avoid to come up with a definition. Maybe Sherlock Holmes can tell Dr Watson to search an appartement for something and answer "you will know it when you see it" when asked what to search for. As you hopefully know, Sherlocjk Holmes and Dr Watson are fictionous characters. You -- and your PhD students -- live in real life. You will have to know what to look for, in order to find it.
> We don't know the details of the project but it's reasonable to > assume we've got an instance of the target "word", so it's just a set of > acoutic vectors. We can build a model of the variation of these that we > may reasonably see in the database.
So these "variations" constitute the elusive "all possible words"?
> We can build models of the > background noise in the database. We can combine both models and > perform a match. If you want details, one representation would be to > use FFT poweres binned on a perceptual scale and hidden Markov models > for the target and background. The target and background models can be > combined by multiplying out the states (parallel model combination) and > assuming the target and background are uncorrelated the observations > powers sum as do the variances. You can then do a Viterbi alignment and > look for the log liklihood difference between your target word occuring > and not occurring. I'm not saying this is the best method, just it's > the first that comes to mind, and perhaps the one I'd start with if I > was a contractor tasked with this, or if I was supervising a research > student/research fellow in this area.
What you have outlined is maths. Maths work with data represented as numbers. The DSP algorithms I know of take numbers as arguments, perform arithmetical operations on the numbers and produce numbers as output. Could you please indicate how you expect to design a DSP algorithm that produces a "word" -- a concept which has yet to be defined -- as output?
> > Apart from that, you might find it interesting to read up on > > Weierstrass' (sp?) representation theorem. It basically says that > > any data sequence can be represented arbitrarily well by any > > set of linearly independent basis functions. > > Sounds perfectly reasonable to me. > > > So basically, you can take any signal and match it to any > > of your sounds in your library. Subtract the template sound that > > matches best, and remove the template from the library. Then > > match the remains of the signal against the remains of the library. > > Some other sound will be the one that matches best. Subtract > > this from the signal and remove the template from the library. > > Repeat this mach - subtract - remove process until either the > > residual == 0 or you have compared the signal with the whole > > library. I'll almost guarantee that you still have a non-zero residual > > by the time you run out of templates. > > > > The Weierstrass theorem is a real killer for most "bright" > > signal analysis ideas based on template matching. > > Well, in this case I wouldn't expect the target and database entries to > be in phase, so simple subraction wouldn't get you anywhere. But your > point seems to be that you can match any target to any template to a > certain degree
No. I can match any target waveform to any library of references and get a match to every single one (assuming perfect arithmetics).
> and from this you draw the concluson that the task is > impossible.
Yes, at least by the technique of template matching.
> That's not a valid step, all that we need to do is > establish a degree of confidence that the target is embedded in the > background, and that is certainly possible to do.
What confidence will that be? I have just argued on the basis of the Weierstrass theorem -- which you aparently agreed to -- that an algorithm will match any library to any data you throw at it. What can you possibly learn from such an exercise that you did not already know? Rune
Rune Allnor wrote:
> tony.nospam@nospam.tonyRobinson.com wrote: > > "Rune Allnor" <allnor@tele.ntnu.no> writes: > > > > > tony.nospam@nospam.tonyRobinson.com wrote: > > > > > So your problem is one of coverage? I see no claim that the target has > > > > to occur in any database. > > > > > > So how are you going to find it? > > > > Who says that you are going to? > > I admit that I have slid to your level of precision; I will make an > effort of that not happening again. I used "find" as a synonym for > "detect". The ad stated "detection of..." > > > I think it perfectly reasonable that > > you output a confidence that the target word appears in every database > > entry. > > OK, this is ridiculous. You obviously differentiate between "find" and > "detect" so I'll join you in that game: You are going to "detect" a > target > but not "find" it... in my world thedifference between amounts to the > somewhat general statement "a target is present in the scenario" and > "the target is located at position (x,y,z)" [for the record: I use > "find" as yet > another synonym, this time for "locate"]. > > I don't have the slightest clue about what you mean when you defend > the ad where "detection" is a key element and you now objects to > my use of "find". > > > > > Now we populate our database with sounds, it might be a parrot saying "I > > > > know what Ambiguous means!", a computer synthesising "Usenet ranting is > > > > a waste of time" or anything else. The patterns can be any sound which > > > > convers all possible words. Are you okay with this or would you like to > > > > argue that you'll need an infinite database to store all possible words? > > > > > > The signal can be any sound, then. Whether it is a word or anything > > > else is unspecified, what I am conserned. > > > > Yes, I've generalised to any sound so that you're not concerned with the > > compexities of some language that is not available to the researcher. > > Let's keep things simple if we can. > > Ah, what is this? Is words that belong to "some language that is not > available to the researcher" by any chance "impossible" according to > the ad's explicit mentioning of "all possible words"? > > Why this simplification all of a sudden? > > > > > We run our matching algorithm, it comes up with a result, perhaps it's > > > > good, perhaps not. Are you happy with this or do you expect perfect > > > > detection? > > > > > > I can't see what relevance this matching has to anything useful. > > > Any matching algorithm will produce a result on any data. Whether > > > the match makes sense or is useful, depends on whether the > > > processing algorithm is based on a representation that > > > matches the data. > > > > True enough. I'm not arguing for any particular algorithm, just that > > it's reasonable to allow for any sound to be the target and any sound to > > be in the database as background. This seemed to be one of your main > > objections. > > My main objection is that you claim to be able to search through > "all possible words". > > > > What makes you think you can find such an elusive > > > signal structure as a "word" when DSP methods that > > > have been researched extensively for 30 years can not > > > tell whether or not a signal segment really contains such > > > a well-defined structure as a monocromatic sinusoidal. > > > > It's easy to add sufficient noise to any pattern matching problem such > > that the probability of detection becomes vanishingly small. This is as > > true of a target word in background noise as it is of a single sinusiod > > in noise. > > No. Hiding words is easier. The term "sinusoidal" has a very > specific meaning and a very specific mathematical representation. > Mathematical algorithms can be designed that take advantage > of these properties, and push the limits of what can be detected. > > > This just makes it a harder task and more iteresting for some > > people - > > Maybe. There are people whose purpose in life is to prove the > existence of the Loch Ness monster, the Yeti or Sasquatch(sp?). > > > not an irresponsibly worded advert or indication of evil > > employers out to depress new PhDs into abandoning their careers. > > I am not trying to depress PhD students into abandoning their > carreers. I am using my own experiences to warn them to > watch out for tell-tale signs of bad projects. > > > > > You've clearly got a problem matching one sound against all other > > > > possible sounds based on your past experience, I just don't see how that > > > > relates to this job ad. > > > > > > OK, define a "word". Does a human have to utter the sound, or is it > > > sufficient that the parrot utters it? Is everything a human utters a > > > "word"? > > > Having defined the "word" somehow, how do you separate a "word" in a > > > recording from a mixture involving other types of sounds? > > > > This is just irrelevent detail, I still don't know why you are hung up > > on it. > > Because stating a defining property of whatever you are looking for > is the key to finding it. You have done every possible twist and tiurn > to avoid to come up with a definition. Maybe Sherlock Holmes can > tell Dr Watson to search an appartement for something and answer > "you will know it when you see it" when asked what to search for. > As you hopefully know, Sherlocjk Holmes and Dr Watson are > fictionous characters. You -- and your PhD students -- live in real > life. You will have to know what to look for, in order to find it. > > > We don't know the details of the project but it's reasonable to > > assume we've got an instance of the target "word", so it's just a set of > > acoutic vectors. We can build a model of the variation of these that we > > may reasonably see in the database. > > So these "variations" constitute the elusive "all possible words"? > > > We can build models of the > > background noise in the database. We can combine both models and > > perform a match. If you want details, one representation would be to > > use FFT poweres binned on a perceptual scale and hidden Markov models > > for the target and background. The target and background models can be > > combined by multiplying out the states (parallel model combination) and > > assuming the target and background are uncorrelated the observations > > powers sum as do the variances. You can then do a Viterbi alignment and > > look for the log liklihood difference between your target word occuring > > and not occurring. I'm not saying this is the best method, just it's > > the first that comes to mind, and perhaps the one I'd start with if I > > was a contractor tasked with this, or if I was supervising a research > > student/research fellow in this area. > > What you have outlined is maths. Maths work with data represented as > numbers. The DSP algorithms I know of take numbers as arguments, > perform arithmetical operations on the numbers and produce numbers > as output. > > Could you please indicate how you expect to design a DSP algorithm > that produces a "word" -- a concept which has yet to be defined -- as > output? > > > > Apart from that, you might find it interesting to read up on > > > Weierstrass' (sp?) representation theorem. It basically says that > > > any data sequence can be represented arbitrarily well by any > > > set of linearly independent basis functions. > > > > Sounds perfectly reasonable to me. > > > > > So basically, you can take any signal and match it to any > > > of your sounds in your library. Subtract the template sound that > > > matches best, and remove the template from the library. Then > > > match the remains of the signal against the remains of the library. > > > Some other sound will be the one that matches best. Subtract > > > this from the signal and remove the template from the library. > > > Repeat this mach - subtract - remove process until either the > > > residual == 0 or you have compared the signal with the whole > > > library. I'll almost guarantee that you still have a non-zero residual > > > by the time you run out of templates. > > > > > > The Weierstrass theorem is a real killer for most "bright" > > > signal analysis ideas based on template matching. > > > > Well, in this case I wouldn't expect the target and database entries to > > be in phase, so simple subraction wouldn't get you anywhere. But your > > point seems to be that you can match any target to any template to a > > certain degree > > No. I can match any target waveform to any library of references and > get a match to every single one (assuming perfect arithmetics). > > > and from this you draw the concluson that the task is > > impossible. > > Yes, at least by the technique of template matching. > > > That's not a valid step, all that we need to do is > > establish a degree of confidence that the target is embedded in the > > background, and that is certainly possible to do. > > What confidence will that be? I have just argued on the basis > of the Weierstrass theorem -- which you aparently agreed to -- > that an algorithm will match any library to any data you throw at it. > > What can you possibly learn from such an exercise that you did > not already know? > > Rune
Rune Allnor wrote:
> tony.nospam@nospam.tonyRobinson.com wrote: > > "Rune Allnor" <allnor@tele.ntnu.no> writes: > > > > > tony.nospam@nospam.tonyRobinson.com wrote: > > > > > So your problem is one of coverage? I see no claim that the target has > > > > to occur in any database. > > > > > > So how are you going to find it? > > > > Who says that you are going to? > > I admit that I have slid to your level of precision; I will make an > effort of that not happening again. I used "find" as a synonym for > "detect". The ad stated "detection of..." > > > I think it perfectly reasonable that > > you output a confidence that the target word appears in every database > > entry. > > OK, this is ridiculous. You obviously differentiate between "find" and > "detect" so I'll join you in that game: You are going to "detect" a > target > but not "find" it... in my world thedifference between amounts to the > somewhat general statement "a target is present in the scenario" and > "the target is located at position (x,y,z)" [for the record: I use > "find" as yet > another synonym, this time for "locate"]. > > I don't have the slightest clue about what you mean when you defend > the ad where "detection" is a key element and you now objects to > my use of "find". > > > > > Now we populate our database with sounds, it might be a parrot saying "I > > > > know what Ambiguous means!", a computer synthesising "Usenet ranting is > > > > a waste of time" or anything else. The patterns can be any sound which > > > > convers all possible words. Are you okay with this or would you like to > > > > argue that you'll need an infinite database to store all possible words? > > > > > > The signal can be any sound, then. Whether it is a word or anything > > > else is unspecified, what I am conserned. > > > > Yes, I've generalised to any sound so that you're not concerned with the > > compexities of some language that is not available to the researcher. > > Let's keep things simple if we can. > > Ah, what is this? Is words that belong to "some language that is not > available to the researcher" by any chance "impossible" according to > the ad's explicit mentioning of "all possible words"? > > Why this simplification all of a sudden? > > > > > We run our matching algorithm, it comes up with a result, perhaps it's > > > > good, perhaps not. Are you happy with this or do you expect perfect > > > > detection? > > > > > > I can't see what relevance this matching has to anything useful. > > > Any matching algorithm will produce a result on any data. Whether > > > the match makes sense or is useful, depends on whether the > > > processing algorithm is based on a representation that > > > matches the data. > > > > True enough. I'm not arguing for any particular algorithm, just that > > it's reasonable to allow for any sound to be the target and any sound to > > be in the database as background. This seemed to be one of your main > > objections. > > My main objection is that you claim to be able to search through > "all possible words". > > > > What makes you think you can find such an elusive > > > signal structure as a "word" when DSP methods that > > > have been researched extensively for 30 years can not > > > tell whether or not a signal segment really contains such > > > a well-defined structure as a monocromatic sinusoidal. > > > > It's easy to add sufficient noise to any pattern matching problem such > > that the probability of detection becomes vanishingly small. This is as > > true of a target word in background noise as it is of a single sinusiod > > in noise. > > No. Hiding words is easier. The term "sinusoidal" has a very > specific meaning and a very specific mathematical representation. > Mathematical algorithms can be designed that take advantage > of these properties, and push the limits of what can be detected. > > > This just makes it a harder task and more iteresting for some > > people - > > Maybe. There are people whose purpose in life is to prove the > existence of the Loch Ness monster, the Yeti or Sasquatch(sp?). > > > not an irresponsibly worded advert or indication of evil > > employers out to depress new PhDs into abandoning their careers. > > I am not trying to depress PhD students into abandoning their > carreers. I am using my own experiences to warn them to > watch out for tell-tale signs of bad projects. > > > > > You've clearly got a problem matching one sound against all other > > > > possible sounds based on your past experience, I just don't see how that > > > > relates to this job ad. > > > > > > OK, define a "word". Does a human have to utter the sound, or is it > > > sufficient that the parrot utters it? Is everything a human utters a > > > "word"? > > > Having defined the "word" somehow, how do you separate a "word" in a > > > recording from a mixture involving other types of sounds? > > > > This is just irrelevent detail, I still don't know why you are hung up > > on it. > > Because stating a defining property of whatever you are looking for > is the key to finding it. You have done every possible twist and tiurn > to avoid to come up with a definition. Maybe Sherlock Holmes can > tell Dr Watson to search an appartement for something and answer > "you will know it when you see it" when asked what to search for. > As you hopefully know, Sherlocjk Holmes and Dr Watson are > fictionous characters. You -- and your PhD students -- live in real > life. You will have to know what to look for, in order to find it. > > > We don't know the details of the project but it's reasonable to > > assume we've got an instance of the target "word", so it's just a set of > > acoutic vectors. We can build a model of the variation of these that we > > may reasonably see in the database. > > So these "variations" constitute the elusive "all possible words"? > > > We can build models of the > > background noise in the database. We can combine both models and > > perform a match. If you want details, one representation would be to > > use FFT poweres binned on a perceptual scale and hidden Markov models > > for the target and background. The target and background models can be > > combined by multiplying out the states (parallel model combination) and > > assuming the target and background are uncorrelated the observations > > powers sum as do the variances. You can then do a Viterbi alignment and > > look for the log liklihood difference between your target word occuring > > and not occurring. I'm not saying this is the best method, just it's > > the first that comes to mind, and perhaps the one I'd start with if I > > was a contractor tasked with this, or if I was supervising a research > > student/research fellow in this area. > > What you have outlined is maths. Maths work with data represented as > numbers. The DSP algorithms I know of take numbers as arguments, > perform arithmetical operations on the numbers and produce numbers > as output. > > Could you please indicate how you expect to design a DSP algorithm > that produces a "word" -- a concept which has yet to be defined -- as > output? > > > > Apart from that, you might find it interesting to read up on > > > Weierstrass' (sp?) representation theorem. It basically says that > > > any data sequence can be represented arbitrarily well by any > > > set of linearly independent basis functions. > > > > Sounds perfectly reasonable to me. > > > > > So basically, you can take any signal and match it to any > > > of your sounds in your library. Subtract the template sound that > > > matches best, and remove the template from the library. Then > > > match the remains of the signal against the remains of the library. > > > Some other sound will be the one that matches best. Subtract > > > this from the signal and remove the template from the library. > > > Repeat this mach - subtract - remove process until either the > > > residual == 0 or you have compared the signal with the whole > > > library. I'll almost guarantee that you still have a non-zero residual > > > by the time you run out of templates. > > > > > > The Weierstrass theorem is a real killer for most "bright" > > > signal analysis ideas based on template matching. > > > > Well, in this case I wouldn't expect the target and database entries to > > be in phase, so simple subraction wouldn't get you anywhere. But your > > point seems to be that you can match any target to any template to a > > certain degree > > No. I can match any target waveform to any library of references and > get a match to every single one (assuming perfect arithmetics). > > > and from this you draw the concluson that the task is > > impossible. > > Yes, at least by the technique of template matching. > > > That's not a valid step, all that we need to do is > > establish a degree of confidence that the target is embedded in the > > background, and that is certainly possible to do. > > What confidence will that be? I have just argued on the basis > of the Weierstrass theorem -- which you aparently agreed to -- > that an algorithm will match any library to any data you throw at it. > > What can you possibly learn from such an exercise that you did > not already know? > > Rune
My colleague Dave Gelbart kindly posted this apparently controversial ad for me, and Tony Robinson has stuck up for me; but it's probably a good idea for me to interject a few things myself. Tony knows both ICSI and my work very well, so he's a good defender, but he's working with the disadvantage of not knowing about this specific project. Dave works for ICSI and knows the general idea, but I haven't expended much effort explaining this project to him either. Maybe this posting can clarify things. First off, mea culpa for not specifying what our research goals are more clearly. Sorry if this led to some unhappy speculations. The project was absolutely underspecified [in the ad] since it wasn't a technical white paper but just an ad to attract curious interest. (It certainly did that, but maybe not in the way I had intended). The ad is for graduating students, to pique their interest, and anyone who inquired would get a fuller explanation of the project. I didn't use potentially clarifying jargon (like wordspotting) since I was trying to reach out beyond the usual group who would know a lot about this kind of research; also I wanted to include nonword targets like laughter. The white paper and proposal submitted to the sponsor said just what we intended to do (I assure you I didn't try to snow them), which was certainly NOT to model every possible interference, but just to improve robustness to the normal and natural situation of having input which is not the desired item. I'll expand on that below: 1) Project goals: in any real speech recognition or wordspotting application, you get inputs which are not in your anticipated inventory. That is sometimes called the OOV (out-of-vocabulary) problem, and more generally refers to "rejection", namely the rejection of any input that is not what you are trying to recognize or detect. This is what you really face in the real world, and is an interesting research problem as well as a practical one for companies. Some of the difficulties you guys have pointed out are precisely why research is needed. I was stating it in a very broad way, but basically the problem is that you have an inventory of models, but you know that input will come that is not in your inventory - what do you do? Every system does something for this, and we are trying to make progress on it. We are making no promise to the sponsor except that we will work on it, using approaches described below. 2) Project methods: I didn't expand on this (or on much of anything technical in the ad), but we are trying to learn from human performance. We have had some good successes (where "success" in our research means that we improved some reasonable metric, not that we made everything perfect) using some other human-inspired signal processing, but in this case we are going a bit further by taking advantage of models that have been developed at the University of Maryland from cortical measurements. 3) Point about matching a model to data - of course you guys are right that if you have a hundred models and you get new data that is not really corresponding to one of your models, that you will get some level of match to all of them. But what is commonly done is that you require some goodness of fit or else you revert to the null hypothesis that it doesn't fit one of your models. This isn't science fiction - many many working systems do something like this. And since you don't know what the non-fitting input is, it really could be anything. But you're right that this is hard to do, which is why it's research. Some years ago Apple tried to do a live demo of a speech recognizer called "Casper" on national TV, and ran into these limitations. The mode of operation was to say, "Casper: pull up a window." Or "Casper, delete this file." Etc. The recognizer was always on, but it didn't initiate any operations until it heard "Casper", no matter what was happening acoustically. It ended up being a pretty embarrassing demo, since while Scully was talking about how great it was, it misrecognized something else he said as "Casper", and began doing all sorts of unwanted operations while the screen was still visible to the TV audience. So the detection of "Casper" while some unknown words or other sounds (laughter of audience, etc) was absolutely imperfect. What you want to do in such research problems is reduce the number of false detections and also reduce the number of misses. And that's all we hope to do. 4) What we expect from postdocs - as with our PhD students, we want to give them challenging problems and work with them to make progress. It's great when groundbreaking revolutionary things happen, but we don't expect that - we expect good ideas, hard work, and fun making progress on a tough problem. Anyway, sorry if the vagueness of my minimal technical description made it sound like I was promising the moon, but in my own mind what I was referring to was the real problem of speech recognition (although more specifically, word spotting); you just don't know what will be coming into the microphone, even if you think you do, and that's what you have to contend with. I hope this clarifies things. I haven't responded to a usenet posting in about a quarter century (I've been busy) but this seemed worthwhile since everyone was guessing what I meant (and, perhaps justifiably, condemning the fact that I hadn't made it clear).
Jerry Avins <jya@ieee.org> writes:

> Tony Robinson wrote: > > ... > > > As I said earlier on, I don't see why it's so impossible to consider a > > distribution over all possible sounds. I'm by no means advocating that > > you can give all possible sounds equal weighting, I don't think that > > defines a meanful distribution. But there are many possible > > distributions you can consider - for example it's not impossible for > > many people to carry around a sound recoder 24 hours a day for say a > > year and then use that. > > Ia far as I can see, it is not possible to catalog all possible sounds, > to say nothing about testing them.
Given a p.d.f. over length the number bits in 16bit 16kHz wav files is finite. We can say much more that that - we can say that they are words or realistic sounds. This space can be paramerterised and modelled. I'm not saying that you can catalog it or test against every possibility, but you can certainly evaluate against a model of the space.
> > There's certainly a lot of cynicism and distrust here, I'm sure that > > this is unfounded with respect to this job ad. > > No distrust here, only sadness and disgust that supposedly educated > people so often fail to avoid ambiguity in what they write even after > careful consideration of the text. When the reader is left to fill in > gaps or discard untenable interpretations, it becomes very unlikely that > writer and reader will see all details in the same light.
Okay Jerry, I'll ask you what you personally find ambiguous. Rune clearly has his own axe to grind based on his life history but I'm trying get to the core of what you and he find so objectionable about the phrase "the detection of one particular target word and/or sound in the background of all other possible words or any other realistic sounds". Tony
Okay Rune, I'll let you rant on.  I've seen your name on comp.dsp many
times and it was my (apparently mistaken) belief that your frequency of
posting held up some respect.  I've put very many hours into this and it
just seems to me that you have no coherent argument and just want to
pick a fight.  I don't think the rest of the world wants to listen to
you putting up straw men and as Morgan has posted and I'm clearly
encouraging you, so I'll shut up.

*plonk as they used to say.
Tony Robinson wrote:

   ...

> Okay Jerry, I'll ask you what you personally find ambiguous. Rune > clearly has his own axe to grind based on his life history but I'm > trying get to the core of what you and he find so objectionable about > the phrase "the detection of one particular target word and/or sound in > the background of all other possible words or any other realistic > sounds".
Don't you think that "a wide variety of" is more accurate than "all possible"? Have we become so accustomed to hyperbole that we can't recognize it when it's pointed out? Would you think a algorithm's proof could reasonably include testing with "all possible integers"? How is that different from "all possible words"? My standard for serious editing -- a goal, not an achievement -- is to so cast each sentence that a malicious nit-picker can't misconstrue it. In these informal discussions, I fall far short of technical writing, and even that needs improving. Nevertheless, I distinguish between "all x is not y" and "not all x is y", and "I only want one cup of coffee" and I want only one cup of coffee". It's a national shame that most scholars don't. Jerry -- Engineering is the art of making what you want from things you can get

Jerry Avins wrote:
 Would you think a algorithm's proof
> could reasonably include testing with "all possible integers"?
I don't know why not. Name an integer that won't work with my algorithm.
> How is > that different from "all possible words"? > =
> My standard for serious editing -- a goal, not an achievement -- is to > so cast each sentence that a malicious nit-picker can't misconstrue it.=
Yeah right. I think you would pick even a nit's nit. This is so beyond nit picking it's ridiculous. = However in advertizing they say even bad publicity is good, so I guess the OP can't really complain about all the attention his ad is getting no matter how unfair the criticism might be. = -jim
> In these informal discussions, I fall far short of technical writing, > and even that needs improving. Nevertheless, I distinguish between "all=
> x is not y" and "not all x is y", and "I only want one cup of coffee" > and I want only one cup of coffee". It's a national shame that most > scholars don't. > =
> Jerry > -- > Engineering is the art of making what you want from things you can get.=
> =AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=
=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF= =AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF ----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups ----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Tony Robinson wrote:
> Jerry Avins <jya@ieee.org> writes: > > >>Tony Robinson wrote: >> >> ... >> >> >>>As I said earlier on, I don't see why it's so impossible to consider a >>>distribution over all possible sounds. I'm by no means advocating that >>>you can give all possible sounds equal weighting, I don't think that >>>defines a meanful distribution. But there are many possible >>>distributions you can consider - for example it's not impossible for >>>many people to carry around a sound recoder 24 hours a day for say a >>>year and then use that. >> >>Ia far as I can see, it is not possible to catalog all possible sounds, >>to say nothing about testing them. > > > Given a p.d.f. over length the number bits in 16bit 16kHz wav files is > finite. We can say much more that that - we can say that they are words > or realistic sounds. This space can be paramerterised and modelled. > I'm not saying that you can catalog it or test against every > possibility, but you can certainly evaluate against a model of the space. > > >>>There's certainly a lot of cynicism and distrust here, I'm sure that >>>this is unfounded with respect to this job ad. >> >>No distrust here, only sadness and disgust that supposedly educated >>people so often fail to avoid ambiguity in what they write even after >>careful consideration of the text. When the reader is left to fill in >>gaps or discard untenable interpretations, it becomes very unlikely that >>writer and reader will see all details in the same light. > > > Okay Jerry, I'll ask you what you personally find ambiguous. Rune > clearly has his own axe to grind based on his life history but I'm > trying get to the core of what you and he find so objectionable about > the phrase "the detection of one particular target word and/or sound in > the background of all other possible words or any other realistic > sounds". >
Puzzling isn't it? The ad specifies a research goal. Typically in signal processing those are not things you either meet or fail to meet, and everyone here has worked wih that their whole career. Goals are usually elastic - e.g. "The radio techique we researched turned out to work pretty well in a variety of circumstances. Very competitive with current alternatives, and cheaper. I think we have a winner, unless....". In the ad, the research goal is to detect a sound amongst arbitrary ones. Only a fool would judge the outcome on a simple pass/fail basis. The sane outcome will be for it to be judged elastically, based on the rate of false alarms and failures to alarm. Seems like more people than Rune are having a bad week. :-) Regards, Steve
Jerry Avins <jya@ieee.org> writes:

> Nevertheless, I distinguish between "all x is not y" and "not all x > is y", and "I only want one cup of coffee" and I want only one cup > of coffee". It's a national shame that most scholars don't.
Quantifier scope depends on context. That's how language works. No shame in that. Best, John
morgan wrote:
> My colleague Dave Gelbart kindly posted this apparently controversial > ad for me, and Tony Robinson has stuck up for me; but it's probably a > good idea for me to interject a few things myself.
[-- snip --] Morgan, Your post certainly gives a somewhat different impression of your group than the ad did. Language is a funny thing. It is the only means people have to communicate in any elaborate way with each other. Most people don't read other people's minds, so they can only relate to others through what they hear or read. The choise of words and phrasings tend to have an impression on the listener or reader; people usually have no choise but to take what they hear or write at face value. Anything else would amount to guess work, second-guessing and so on. If you really want to attract new people, use terms that professional generalists can understand. This ad would not be aimed at the arbitrary John Doe, so technical terms in general are permitted. Failing to understand the importance of presice language would lead to a rapid decline of your group. Probably not while you are in charge, but you might risk getting a successor who have not contemplated the lingo he or she hears in the halls, and that acts on the general, roundabout terms as if they were to be taken literally. If -- when -- that happens, all hell breaks loose. Believe me. You built that group. You set the standards. Now you know the risk. Rune