I've been tinkering with a problem involving detecting a set of tones in the presence of noise. I've looked at Goertzl and matched filter approaches, but I've had another idea: Assuming that I have a large corpus of sample waveforms, and the tones that they are known to contain, could a machine learning approach be used, whereby a set of parameters or coefficients could be inserted into some sort of "black box" (a set of filters?) and the machine could be taught to set the coefficients such that it detects the correct (known) tones in each example from the corpus? Presumably after enough training using the example waveforms, the "black box" would have "learned" how to detect the tones in the noise?
That resembles a "genetic algorithm" (GA) process (and goes by other names), more than AI. The BIG issue is how to set (train) limits, especially if this will be used within a safety-critical application. Without a true closed-form solution, there is no (known) way to "verify" AI or GA for "correctness", as there is no metric to use (catch 22). The AI or GA may, like "us", guess at the answer, but too often, those "gut" responses result in a punch to the gut!
Try FIR filtering. Lots of data on that, including stuff on "tunable" FIRs (e.g. tracking filter).
In the meantime I'll take a peek under the hood of FIR filters, at least for the "front end" the first decision point.
Since you're open to a machine learning solution and seemingly not tied to dedicated algorithms, I would suggest a deep learning approach. Generate moving 2-D spectrographs of your tones, for example framesize 20 to 50 msec (or however long your tones are "quasi-stationary"), FFT size 1024, 50% overlap, Hamming window, and at least 128 color levels. At that point you have a set of images, which is exactly what the state of the art in CNNs (convolutional neural networks) is very good at.
If you have an extensive training set for your tones, both w/wo noise, you can probably get very good recognition accuracy.
This is essentially the "new approach" in speech recognition, although for us old guys, it's actually not new at all. Back in the 1980s they tried to do speech recognition this way using "expert systems"; i.e. encoding the knowledge of a human who was expert at reading 2-D spectrographs (don't laugh, I kid you not). Unfortunately they were lacking horsepower (300+ W GPU boards) to support deep neural networks (multi-layer NNs) and huge training data sets, so they didn't get far.
Before you throw out the conventional techniques, it seems appropriate to ask a few questions.
1) Are you trying to just determine the existence of certain known tones in noise?
2) Are you trying to discriminate between know tones?
3) How much variability is there in the tones (e.g. ppm offset)?
4) How long are the tone bursts?
5) What is the allowable false detection and missed detection rate?
In my experience, a Goertzel is very nearly an optimal detector for pure tones. However, there are problems with fine discrimination among tones if they are low frequency compared to the sampling rate.
I view the Geortzel as a sympathetic oscillator. You drive it with a signal and if the tone exists within the signal, it will find it, providing that the tone is long enough.
There are even ways to develop detectors using Goertzels that may have slight offsets and short durations.
There is also the old technique, in adaptive filtering, called the "line enhancer", which allows for the reduction in noise compared to the signal if the signal of interest is periodic.