Cross-synthesis is the technique of impressing the spectral envelope of one sound on the flattened spectrum of another. A typical example is to impress speech on various natural sounds, such as ``talking wind.'' Let's call the first signal the ``modulating'' signal, and the other the ``carrier'' signal. Then the modulator may be a voice, and the carrier may be any spectrally rich sound such as wind, rain, creaking noises, flute, or other musical instrument sound. Commercial ``vocoders'' (§G.10G.5) used as musical instruments consist of a keyboard synthesizer (for playing the carrier sounds) and a microphone for picking up the voice of the performer (to extract the modulation envelope).

Cross-synthesis may be summarized as consisting of the following steps:

  1. Perform a Short-Time Fourier Transform (STFT) of both the modulator and carrier signals (§7.1).

  2. Compute the spectral envelope of each time-frame (as described in the next section).

  3. Optionally divide the spectrum of each carrier frame by its own spectral envelope, thereby flattening it.

  4. Multiply the flattened spectral frame by the envelope of the corresponding modulator frame, thereby replacing the carrier's envelope by the modulator's envelope.

For an audio example of cross-synthesis (a ``talking organ''), see

Spectral Envelope Extraction
Fundamental Frequency Estimation from Spectral Peaks