You may want to wait for more experienced posters to chime in, but I believe the answer to your question is no.
If the process applied to the audio has a mathematical inverse, then the answer would be yes.
I'm not an expert on phase vocoders, but I don't believe a phase vocoder has a mathematical inverse (in the general case).
As for using machine learning, I don't think that will help you either.
Best of luck,
Tone modification in a phase vocoder uses time-scaling in the synthesis stage. Time-scaling works by using phase information to estimate the correct frequencies. The modified signal does not have a mathematical inverse anymore, but one can still get an approximate inverse.
If you want to undo the modified signal, bear in mind that you will need again to estimate phase information. It may be done, but as phase error is accumulated the coherence is lost. A clear way to notice the loss of coherence is in the transients: note attacks will smear out and lose their natural sound. The same will happen with the harmonic partials of one note.
If your signal consists only of voice, you might get a reasonable result; no so much for music.
Hope that helps.