I have a DSP based piece of audio equipment that will shift (in frequency) the pitch of incoming audio - it will digitize incoming audio, process in the DSP and convert back to analog in real time - the algorithm can transform pitch up or down over +/- an octave in factors of about (1.059)^N - where N is an integer (used for semitones in music).
I decided to measure latency of this equipment and had expected to see something roughly along these lines:
Assuming the algorithm must first measure the incoming frequency - this automatically constrains a minimum amount of time needed to make this determination. Audio frequencies usually have a lower bound of 20Hz and 1/20Hz = 50ms.
I measured latency using an oscilloscope - it tended to be right around 10ms and that surprised me.
For an incoming sinewave I suppose one could measure a half period of audio (or even a quarter period)? But, that seems problematic if the incoming signal is complex and/or harmonically rich as music or instruments often are - to only rely on
So, what's the trick? Does anyone know anything about this type of algorithm? What about audio processing in general? What is the usual or expected time needed to process audio signals (digitizing, DSP filtering and conversion back to audio)?
Would the overlap and add method shed light on this?
If there’s too much latency, it will affect the performer. 10ms is about as much as I would expect to see, before it started to get obnoxious.
There three sources of delay, conversion times, dsp effects and buffering.
It takes a finite amount of time to convert signals to/from analog/digital. Generally speaking, this is a fast process. When the converter implements over/undersampling the filters used can create an appreciable delay.
DSP processes will have some delay. For example, any LTI system, like a filter, has a group delay characteristic that is a function of frequency.
Buffering is a way to make the processing more efficient. In a dsp implementation, it is more efficient to process multiple samples simultaneously than to process individual samples. My guess is this is the primary source of your 10ms delay, but I’m speculating.
Pitch shifting isn’t exactly a textbook dsp process, so there are any number of ways it can be implemented. I would guess that a phase vocoder is being used which is a type of OLA technique. Maybe someone else could point you to some docs on the subject
Music is usually 20Hz-20kHz, but voice is only 200 Hz to 3.2kHz. My guess is that your software doesn't handle as low of frequency signals as you suspect. The period at 200 Hz is 5ms, consistent with the 10ms latency you are measuring.
feed a signal with clear envelop variation, like an am signal. measure the delay in envelope mid points. hope it helps.
Question - how did you measure the latency of a pitch-shifted signal?
More exactly asking, if pitch-shifting is defined as preserving either the amplitude of the harmonics or the spectral envelope, the shape of the time-domain signal could be subjected to lots of changes. I don't expect an oscilloscope to be the appropriate tool.