DSPRelated.com
Forums

Video-equivalent of "pitch-shifting."

Started by Radium August 21, 2007
Jerry Avins wrote:

> < ....snip.. > > > If Radium doesn't take the trouble to understand your clear exposition, > he's beyond help. > > Jerry > --
You sound like someone who has never followed a "Radium" thread before. ;-} [The extent of the cross-post lists says much about his usenet ~mastery~ ] Later... Ron --
On Wed, 22 Aug 2007 01:17:31 +0000, Stuart wrote:
> "BobG" <bobgardner@aol.com> wrote in message > news:1187739471.037333.144590@x40g2000prg.googlegroups.com... >> If you play the tape or record faster, the pitch shifts up. If colors >> are analogous to pitch, speeding up would be a shift to the blue, >> slowing down would be a red shift. > > Sound is physical - Light is electromagnetic radiation > > That would make a great Sci-Fi effect to depict 'beings' in a different > temporal dimension co-existing with us but what you describe is the effect > of motion linking the audio analogy to video which is not valid. Sound is an > air-pressure wave whose speed changes depending on the medium whereas light > is part of the electromagnetic spectrum whose speed is fixed to the speed of > light and except for some very high-end academic experiments never changes. > Never-the-less it's a good special effects used in a modified way in the BBC > production Ultra Violet.
I believe the point was about the Doppler effect, which they both experience, albeit in very different media. In fact, you can see this in water waves, if you have a boat that's going more slowly than the "speed of wave" in that body of water. But, yes, alas, other than moving at a significant fraction of c, I don't think there's any way to exploit blue-shift. ;-) Hope This Helps! Rich
On Aug 21, 4:13 pm, Radium <gluceg...@gmail.com> wrote:
> Anyways, Adobe Audition and voice-changers allow the frequencies of an > audio signal to be shifted w/out low-pass filtering or changing the > tempo. There are two video-equivalents of this because, while audio > has only one frequency component [temporal], video has two [temporal > and spatial].
Voice-changing and pitch shifting algorithms work by duplicating or throwing away information which the ear can rarely detect, but would be really obvious to the eye (a phoneme may sound the same with less or more excitation cycles, but a picture of your family would not look the same with a some people missing, or with twin children added.) MPEG-2 video compression already does a coarse equivalent of time-domain pitch shifting via motion estimation and compensation, e.g. it throws away whole frames of video and repeats the spacial components from previous frames, sometimes skipping some new motion (leading to jerky patches of video playback if the compression rate is lower than a suitable information bandwidth). IMHO. YMMV.

Radium wrote:

> The temporal video-equivalent would be changing the rate of back/ > forth, up-down or other repetitive/cyclical movement [such as wing- > flapping or flickering of lights] of the video signal without high/low- > pass-filtering, separating any portion of the video signal, or > changing the speed at which the video-signal -- just as voice-changers > can lower the frequency of audio without changing the speed of the > audio. Using a voice-changer to decrease the pitch your voice will not > cause your speech to slow down.
In order to change the 'spacial frequency' aspect of video data without altering the 'temporal' aspect you have to either add or delete information interstitially and then play back the altered data at a compensated data rate. (Using the term 'data' in the most general sense here.) jk
> The spatial video-equivalent would be changing the "sharpness" of a > still image without high/low-pass-filtering or changing the size of > the image. > > Below is an example of low-pass-filtering in the spatial domain: > > Here is an original picture: > > http://www-dse.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/sab/report.normalimage.jpg > > Here is the picture after low-pass filtering: > > http://www-dse.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/sab/report.lopass.jpg > > I obviously do not want this at all. Low-pass filtering involves > removing high-frequency components while preserving the low-frequency > components. Once again, this is not what I want. If a device cannot > handle high-frequencies, then I would like all the frequencies of the > signal to be down-shifted until the highest frequency is low-enough to > be acceptable to the device. This down-shifting should be done w/out > slowing the speed of the signal -- or in the case of spatial > frequency, w/out increasing the size of the image. > > > Thanks for your assistance, cooperation, and understanding, > > Radium >
In article <1187813674.340127.26150@q3g2000prf.googlegroups.com>,
 "Ron N." <rhnlogic@yahoo.com> wrote:

> MPEG-2 video compression already does a coarse > equivalent of time-domain pitch shifting via motion > estimation and compensation, e.g. it throws away > whole frames of video and repeats the spacial > components from previous frames, sometimes skipping > some new motion (leading to jerky patches of video > playback if the compression rate is lower than a > suitable information bandwidth).
Not really. MPEG-2 is frame rate conservative, end-to-end. In fact, the output frame rate is required by the standard to be *identical* to the input rate. That has to be the case for it to be able to handle NTSC or PAL delivered to ordinary TV sets. And if there are "jerky patches", that just means that it was improperly applied. Isaac
On Aug 22, 8:31 pm, isw <i...@witzend.com> wrote:
> In article <1187813674.340127.26...@q3g2000prf.googlegroups.com>, > "Ron N." <rhnlo...@yahoo.com> wrote: > > > MPEG-2 video compression already does a coarse > > equivalent of time-domain pitch shifting via motion > > estimation and compensation, e.g. it throws away > > whole frames of video and repeats the spatial > > components from previous frames, sometimes skipping > > some new motion (leading to jerky patches of video > > playback if the compression rate is lower than a > > suitable information bandwidth). > > Not really. MPEG-2 is frame rate conservative, end-to-end. In fact, the > output frame rate is required by the standard to be *identical* to the > input rate. That has to be the case for it to be able to handle NTSC or > PAL delivered to ordinary TV sets.
MPEG is frame rate conservative, but only the I frames are actually sent as full images. The P and B frames are made up out of some duplicated and possibly displaced contents of other frames, plus some quantized portion of an error vector depending on the compression rate. Thus the data bandwidth required for an P and B frames is a fraction of that typically required for the full image, as contained a nearby I frame. Some pitch-shifters or time-stretchers also duplicate and blend preceding and following periods of waveforms or spectral frame contents.
In article <1187854626.453039.298850@q3g2000prf.googlegroups.com>,
 "Ron N." <rhnlogic@yahoo.com> wrote:

> On Aug 22, 8:31 pm, isw <i...@witzend.com> wrote: > > In article <1187813674.340127.26...@q3g2000prf.googlegroups.com>, > > "Ron N." <rhnlo...@yahoo.com> wrote: > > > > > MPEG-2 video compression already does a coarse > > > equivalent of time-domain pitch shifting via motion > > > estimation and compensation, e.g. it throws away > > > whole frames of video and repeats the spatial > > > components from previous frames, sometimes skipping > > > some new motion (leading to jerky patches of video > > > playback if the compression rate is lower than a > > > suitable information bandwidth). > > > > Not really. MPEG-2 is frame rate conservative, end-to-end. In fact, the > > output frame rate is required by the standard to be *identical* to the > > input rate. That has to be the case for it to be able to handle NTSC or > > PAL delivered to ordinary TV sets. > > MPEG is frame rate conservative, but only the I frames are > actually sent as full images. The P and B frames are made > up out of some duplicated and possibly displaced contents of > other frames, plus some quantized portion of an error vector > depending on the compression rate. Thus the data bandwidth > required for an P and B frames is a fraction of that typically > required for the full image, as contained a nearby I frame.
Yup. Also, some of those frames are sent out of sequence (I or P "anchor" frames must be present first, in order for the interpolated frames to be recreated), but every frame has a representation of some sort in the stream, every frame gets put in its proper place by the decoder, and no frames are skipped.
> Some pitch-shifters or time-stretchers also duplicate and > blend preceding and following periods of waveforms or > spectral frame contents.
Yes again. The difference is that with MPEG video the "duplicating and blending" has zero effect on the frame rate (i.e., the temporal resolution). There is an interesting sort-of exception to frame rate conservation, when film source is encoded at 24 FPS (actually about 23.98) and the decoder performs 3-2 pulldown to deliver the NTSC-required 29.97 FPS, but that's not germane to this discussion. Isaac
On Aug 23, 10:02 am, isw <i...@witzend.com> wrote:
> There is an interesting sort-of exception to frame rate conservation, > when film source is encoded at 24 FPS (actually about 23.98) and the > decoder performs 3-2 pulldown to deliver the NTSC-required 29.97 FPS, > but that's not germane to this discussion.
Actually, it is very germane, since 3-2 pulldown is similar to how some primitive audio pitch/rate changing hardware worked, by duplicating small time domain frames of audio at a fixed proportion and rate. Some MPEG decoders do "special effects" by varying the frame duplicate/drop fractions to slow down or speed up playback using the same mechanism as for pulldown.
On Aug 23, 3:37 am, "Ron N." <rhnlo...@yahoo.com> wrote:
> > Some pitch-shifters or time-stretchers also duplicate and > blend preceding and following periods of waveforms or > spectral frame contents.
when i first saw the thread title, that's what i first thought about. actually, not pitch-shifting but more time-scaling. it seems to me natural that if they were speeding up or slowing down the motion in the video (which means only for the termporal dimension, not either "x" or "y"), that would naturally correspond to the same speeding up or slowing down of tempo (without pitch change) of the audio. if you twist the knob that makes the actress talk faster (Ms. Motormouth), it shouldn't be upshifting her pitch to sound like Wendy or Bebe in South Park. r b-j
On Aug 22, 1:37 am, isw <i...@witzend.com> wrote:
> > The fact that there is essentially no relation between these two > entities -- i.e. the data stream is comprised of a sequence of > descriptions of a series of still images -- is the reason why what you > want to do is almost certainly impossible. > > If you really want to try, the first step will be to devise a method of > recording video that does not quantize the temporal axis; i.e. not using > a sequence of still images.
can't we think of the intensity (and chroma components) of a particular point (x,y) of a still image as a sampled (at a rate of 30 Hz) value of a continuous-time signal that represents intensity at that point? i.e. we have I(x,y,t) being sampled as I(x,y,n*T). and then use some kinda interpolation to hypothetically reconstruct the "still" images in between the sequence we are given? i imagine there would be some blurring, but if the resolution was very good to start with, would that not work. at least as a beginning point? r b-j