I recently stumbled across the Differentiable DSP (DDSP) library for Python/Tensorflow. This seems to combine DSP techniques with a neural network in an analysis/synthesis configuration.
Thought I would provide some links, as it may be of interest to those involved with audio work. The code is open source on Github.
Personally, I found both the concept and the audio samples fascinating.
The authors are: Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu and Adam Roberts - wonder if any of them are lurking on dsprelated ? :-)
I took a quick look, it does sound interesting. The current state of the art in speech recognition has turned 180 from the old days, now they depend on DNNs to train and recognize everything, including all types of background noise and interference, even silence. Yes I said silence -- forget the last 30 years of voice activity detection and background noise classification algorithm development, who needs that, right ?
I've been thinking that maybe millennial data scientists have run just a tad amok, seeing as how the brain performs exceptionally well on 60 W whereas these guys want to run systems with many layers that consume kilowatts, even for inference. Maybe they shouldn't have turned 180, but more like 90 ... any work that combines judicious use of signal processing with DNNs could be promising.-Jeff
GPUs and FPGAs consume maybe 7 Watts, not kilowatts. This research is important as we integrate audio processing with other forms of machine cognition.
For some minimal inference like basic facial recognition or a wake word. For anything serious, like say Alexa, it's 10+ kW counting servers + GPU boards running multiple concurrent models to handle just one conversation.
So yeah FPGAs and Tegra type GPUs consuming 7W are good for IoT, but that's not serious AI and not anywhere even remotely close to what the brain does with 7W.