I am currently trying to regenerate piano sounds using machine learning techniques. Here are some of the results ; as you can hear, it needs improvement. Would any of you know what this effect could be ? Knowing what causes this, I could add some sort of post processing to counter the effect.
Thanks a lot !
Original 1 : real1.wav
Generated 1 : gen1.wav
Original 2 : real2.wav
Generated 2 : gen2.wav
The generated files are heavily clipped versions of the real files. If you amplify the real file by factor 1000 and then clip saturate them at +-1.0, they will sound very similar to the generated file.
Try normalizing the generated file to +-0.99 before saving or playing back.
Agreed that it's totally saturated/clipped, especially if you "view" it on something like Audacity. But this may need to be fixed early in the generation process, and not in post-processing, depending on how it's done. If generation is integers, then attenuate first. If floating point and NOT in saturation, then post processing to normalize to say 16 bit WAV files is OK. Since saturation/clipping is nonlinear, once it's done it cannot be fixed since the original amplitude information is lost forever.
And I'm curious: why "machine learning"? That works best if one has a good grasp of the target model. Not a good candidate for algorithmic annealing (a process to tailor and select the best algorithm). Why not a simple sampling technique? There's tons of info on that, including at https://www.pjrc.com/teensy/td_libs_Audio.html. Doing a full synthesis requires knowing the physics of a piano and the end result tends to be a simplified version of the full physical processes.
Yes it was clipped and I fixed it, thank you. My goal is not to sample sound ; I am actually trying to create a model to generate music with an autoencoder if that speaks to you. The first step is to be able to encode the sound into a smaller space (called latent) and decode it correctly enough : that was where I had the clipping problem. At first I thought that my model wasn't good, and that maybe I could postprocess the signal to get back to the original sound, but it turns out that something was wrong in the way I saved the file. Eventually, my goal will be to generate music by tweaking the latent space, and decoding it.
It was indeed clipped, I found a solution to the problem, thanks !
I looked at real1.wav and ge1.wav in SoundForge. Real1 is 16-bit integer and gen1 is 32-bit IEEE floating point with 90.2db of gain with respect to real1, so much gain that at first glance it seems clipped. Gen1 totally overloads SoundForge signal statistics computation. After decreasing gen1 80db and increasing real1 10.19db the waveforms look identical in detail.
I eventually realized that something was wrong in the way I saved my files after generating them, thank you !