I've been tinkering with a problem involving detecting a set of tones in the presence of noise. I've looked at Goertzl and matched filter approaches, but I've had another idea: Assuming that I have a large corpus of sample waveforms, and the tones that they are known to contain, could a machine learning approach be used, whereby a set of parameters or coefficients could be inserted into some sort of "black box" (a set of filters?) and the machine could be taught to set the coefficients such that it detects the correct (known) tones in each example from the corpus? Presumably after enough training using the example waveforms, the "black box" would have "learned" how to detect the tones in the noise…
Tim's First Rule of Pragmatism: if you can do it with a rock, don't take the time to build a hammer (or even to get one from the shed).
There are a lot of known algorithms for detecting tones. If that's all you need to do, use 'em. Simpler, better, sooner -- what's not to like? Only if you're doing something that's outside of what the existing algorithms should you do something like ML.
I remember going through this with fuzzy logic, back in the 20th century. Fuzzy logic solves a few problems in control, and does a pretty good job of it. But people were absolutely wild to jump on the fuzzy logic bandwagon, to the point where there were startups making specialized chips that supported it (hint: if you're working for a startup that has a buzzword in its name, make sure to get your cash up front. Don't work for future stock options!).
To a lesser extent, this happened with neural nets. Neural nets were the precursor to the larger domain of machine learning, but I know exactly one guy whose employers actually made money on neural nets (for a while, there were certainly people making out like bandits, but their investors were usually taking a bath). He's now working in machine learning.
As with fuzzy logic, there were application-specific neural net chips out there -- and you don't see those on the market any more either.
I'm sure this would work but if you are trying to detect tones then this sort of solution is like using an n squared algorithm to solve a problem that can be solved in n*log(n) (I am assuming Goertzl is n*log(n))
If you wanted to use ML then I'd have a single perceptron on the output of Goertzl's algorithm.
Deep learning is already being used for tone detection in presence of noise, on a far more complex basis than a basic sets of tones -- that's speech recognition.
If you search for TDNN you can see the basics: sliding FFT analysis producing a 2-D spectrograph that becomes the basis for a convolutional/deep neural network. You would not want a mel frequency scale, and possibly not the time delay aspect (the T in TDNN) for auto-alignment of tones, but otherwise concepts are the same.
My guess is this could be overkill for a small set of tones such as DTMF, but on the other hand if you face distortions such as frequency warping, "thin" tones, etc then it would be extremely effective, assuming you have suitable training data.
I assume by "set of parameters or coefficients" you are referring to features of each waveform that would be used in a "neural network." With that, I would expect the transforms identified below for speech recognition would be a good place to start.
But, then you state, "the machine could be taught to set the coefficients ...," which confuses me a bit. Are you saying you want your inference to predict coefficients or tones? I will go with that you are (1) wanting to find features to extract from each waveform and then (2) train a neural network that will detect specific tones.
Knowing which neural network structure to use would be tricky. An MLP vs. the models seen in speech processing are far different. Also, if you plan to "port" your model (let's say you use TensorFlow to build it) to an ARM based device, then that will also factor in as well ... real-time sampling vs. your corpus of waveforms ... floating point vs. fixed, etc.
If this is purely theoretical, then have fun and try lots of different things. As long as you can explain your findings (perhaps this is a class paper/project), you will probably be ok. If this is something you want done on a real device, then I would consider what's doing the number crunching.
The minimum processor would be a GPU (graphics processing unit) which is the most common device for Artificial Intelligence Deep Learning as it involves massive matrix algebra calculations. This is a classic image classification problem translated to sound. I agree with those who say that this is a Rube Goldberg solution to a mousetrap. The original questioner would do well to take a class in Artificial Intelligence Deep Learning.
You could also consider an FPGA vs. a GPU, specifically the Ultra96 and Xilinx DNNDK. GPUs are great for training, but you can get see better results (latency and inference) with an FPGA.