comp.dsp | Video-equivalent of "pitch-shifting."| page 3

Reply by Ron Capik ●August 22, 20072007-08-22

Jerry Avins wrote:

> < ....snip.. >
>
> If Radium doesn't take the trouble to understand your clear exposition,
> he's beyond help.
>
> Jerry
> --

You sound like someone who has never followed a
"Radium" thread before.       ;-}

[The extent of the cross-post lists says much about
his usenet ~mastery~ ]

Later...

Ron
--

Reply by Rich Grise ●August 22, 20072007-08-22

On Wed, 22 Aug 2007 01:17:31 +0000, Stuart wrote:
> "BobG" <bobgardner@aol.com> wrote in message 
> news:1187739471.037333.144590@x40g2000prg.googlegroups.com...
>> If you play the tape or record faster, the pitch shifts up. If colors
>> are analogous to pitch, speeding up would be a shift to the blue,
>> slowing down would be a red shift.
> 
> Sound is physical - Light is electromagnetic radiation
> 
> That  would make a great Sci-Fi effect to depict 'beings' in a different 
> temporal dimension co-existing with us but what you describe is the effect 
> of motion linking the audio analogy to video which is not valid. Sound is an 
> air-pressure wave whose speed changes depending on the medium whereas light 
> is part of the electromagnetic spectrum whose speed is fixed to the speed of 
> light and except for some very high-end academic experiments never changes. 
> Never-the-less it's a good special effects used in a modified way in the BBC 
> production Ultra Violet.

I believe the point was about the Doppler effect, which they both
experience, albeit in very different media. In fact, you can see this
in water waves, if you have a boat that's going more slowly than the
"speed of wave" in that body of water.

But, yes, alas, other than moving at a significant fraction of c, I
don't think there's any way to exploit blue-shift. ;-)

Hope This Helps!
Rich

Reply by Ron N. ●August 22, 20072007-08-22

On Aug 21, 4:13 pm, Radium <gluceg...@gmail.com> wrote:
> Anyways, Adobe Audition and voice-changers allow the frequencies of an
> audio signal to be shifted w/out low-pass filtering or changing the
> tempo. There are two video-equivalents of this because, while audio
> has only one frequency component [temporal], video has two [temporal
> and spatial].

Voice-changing and pitch shifting algorithms work
by duplicating or throwing away information which
the ear can rarely detect, but would be really
obvious to the eye (a phoneme may sound the same
with less or more excitation cycles, but a picture
of your family would not look the same with a some
people missing, or with twin children added.)

MPEG-2 video compression already does a coarse
equivalent of time-domain pitch shifting via motion
estimation and compensation, e.g. it throws away
whole frames of video and repeats the spacial
components from previous frames, sometimes skipping
some new motion (leading to jerky patches of video
playback if the compression rate is lower than a
suitable information bandwidth).

IMHO. YMMV.

Reply by Jim Kelley ●August 22, 20072007-08-22


Radium wrote:

> The temporal video-equivalent would be changing the rate of back/
> forth, up-down or other repetitive/cyclical movement [such as wing-
> flapping or flickering of lights] of the video signal without high/low-
> pass-filtering, separating any portion of the video signal, or
> changing the speed at which the video-signal -- just as voice-changers
> can lower the frequency of audio without changing the speed of the
> audio. Using a voice-changer to decrease the pitch your voice will not
> cause your speech to slow down.

In order to change the 'spacial frequency' aspect of video data 
without altering the 'temporal' aspect you have to either add or 
delete information interstitially and then play back the altered data 
at a compensated data rate.  (Using the term 'data' in the most 
general sense here.)

jk

> The spatial video-equivalent would be changing the "sharpness" of a
> still image without high/low-pass-filtering or changing the size of
> the image.
> 
> Below is an example of low-pass-filtering in the spatial domain:
> 
> Here is an original picture:
> 
> http://www-dse.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/sab/report.normalimage.jpg
> 
> Here is the picture after low-pass filtering:
> 
> http://www-dse.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/sab/report.lopass.jpg
> 
> I obviously do not want this at all. Low-pass filtering involves
> removing high-frequency components while preserving the low-frequency
> components. Once again, this is not what I want. If a device cannot
> handle high-frequencies, then I would like all the frequencies of the
> signal to be down-shifted until the highest frequency is low-enough to
> be acceptable to the device. This down-shifting should be done w/out
> slowing the speed of the signal -- or in the case of spatial
> frequency, w/out increasing the size of the image.
> 
> 
> Thanks for your assistance, cooperation, and understanding,
> 
> Radium
>

Reply by isw ●August 23, 20072007-08-23

In article <1187813674.340127.26150@q3g2000prf.googlegroups.com>,
 "Ron N." <rhnlogic@yahoo.com> wrote:

> MPEG-2 video compression already does a coarse
> equivalent of time-domain pitch shifting via motion
> estimation and compensation, e.g. it throws away
> whole frames of video and repeats the spacial
> components from previous frames, sometimes skipping
> some new motion (leading to jerky patches of video
> playback if the compression rate is lower than a
> suitable information bandwidth).

Not really. MPEG-2 is frame rate conservative, end-to-end. In fact, the 
output frame rate is required by the standard to be *identical* to the 
input rate. That has to be the case for it to be able to handle NTSC or 
PAL delivered to ordinary TV sets.

And if there are "jerky patches", that just means that it was improperly 
applied.

Isaac

Reply by Ron N. ●August 23, 20072007-08-23

On Aug 22, 8:31 pm, isw <i...@witzend.com> wrote:
> In article <1187813674.340127.26...@q3g2000prf.googlegroups.com>,
>  "Ron N." <rhnlo...@yahoo.com> wrote:
>
> > MPEG-2 video compression already does a coarse
> > equivalent of time-domain pitch shifting via motion
> > estimation and compensation, e.g. it throws away
> > whole frames of video and repeats the spatial
> > components from previous frames, sometimes skipping
> > some new motion (leading to jerky patches of video
> > playback if the compression rate is lower than a
> > suitable information bandwidth).
>
> Not really. MPEG-2 is frame rate conservative, end-to-end. In fact, the
> output frame rate is required by the standard to be *identical* to the
> input rate. That has to be the case for it to be able to handle NTSC or
> PAL delivered to ordinary TV sets.

MPEG is frame rate conservative, but only the I frames are
actually sent as full images.  The P and B frames are made
up out of some duplicated and possibly displaced contents of
other frames, plus some quantized portion of an error vector
depending on the compression rate.  Thus the data bandwidth
required for an P and B frames is a fraction of that typically
required for the full image, as contained a nearby I frame.

Some pitch-shifters or time-stretchers also duplicate and
blend preceding and following periods of waveforms or
spectral frame contents.

Reply by isw ●August 23, 20072007-08-23

In article <1187854626.453039.298850@q3g2000prf.googlegroups.com>,
 "Ron N." <rhnlogic@yahoo.com> wrote:

> On Aug 22, 8:31 pm, isw <i...@witzend.com> wrote:
> > In article <1187813674.340127.26...@q3g2000prf.googlegroups.com>,
> >  "Ron N." <rhnlo...@yahoo.com> wrote:
> >
> > > MPEG-2 video compression already does a coarse
> > > equivalent of time-domain pitch shifting via motion
> > > estimation and compensation, e.g. it throws away
> > > whole frames of video and repeats the spatial
> > > components from previous frames, sometimes skipping
> > > some new motion (leading to jerky patches of video
> > > playback if the compression rate is lower than a
> > > suitable information bandwidth).
> >
> > Not really. MPEG-2 is frame rate conservative, end-to-end. In fact, the
> > output frame rate is required by the standard to be *identical* to the
> > input rate. That has to be the case for it to be able to handle NTSC or
> > PAL delivered to ordinary TV sets.
> 
> MPEG is frame rate conservative, but only the I frames are
> actually sent as full images.  The P and B frames are made
> up out of some duplicated and possibly displaced contents of
> other frames, plus some quantized portion of an error vector
> depending on the compression rate.  Thus the data bandwidth
> required for an P and B frames is a fraction of that typically
> required for the full image, as contained a nearby I frame.

Yup. Also, some of those frames are sent out of sequence (I or P 
"anchor" frames must be present first, in order for the interpolated 
frames to be recreated), but every frame has a representation of some 
sort in the stream, every frame gets put in its proper place by the 
decoder, and no frames are skipped.

> Some pitch-shifters or time-stretchers also duplicate and
> blend preceding and following periods of waveforms or
> spectral frame contents.

Yes again. The difference is that with MPEG video the "duplicating and 
blending" has zero effect on the frame rate (i.e., the temporal 
resolution).

There is an interesting sort-of exception to frame rate conservation,  
when film source is encoded at 24 FPS (actually about 23.98) and the 
decoder performs 3-2 pulldown to deliver the NTSC-required 29.97 FPS, 
but that's not germane to this discussion.

Isaac

Reply by Ron N. ●August 23, 20072007-08-23

On Aug 23, 10:02 am, isw <i...@witzend.com> wrote:
> There is an interesting sort-of exception to frame rate conservation,
> when film source is encoded at 24 FPS (actually about 23.98) and the
> decoder performs 3-2 pulldown to deliver the NTSC-required 29.97 FPS,
> but that's not germane to this discussion.

Actually, it is very germane, since 3-2 pulldown is similar
to how some primitive audio pitch/rate changing hardware worked,
by duplicating small time domain frames of audio at a fixed
proportion and rate.  Some MPEG decoders do "special effects"
by varying the frame duplicate/drop fractions to slow down
or speed up playback using the same mechanism as for pulldown.

Reply by robert bristow-johnson ●August 23, 20072007-08-23

On Aug 23, 3:37 am, "Ron N." <rhnlo...@yahoo.com> wrote:
>
> Some pitch-shifters or time-stretchers also duplicate and
> blend preceding and following periods of waveforms or
> spectral frame contents.

when i first saw the thread title, that's what i first thought about.
actually, not pitch-shifting but more time-scaling.  it seems to me
natural that if they were speeding up or slowing down the motion in
the video (which means only for the termporal dimension, not either
"x" or "y"), that would naturally correspond to the same speeding up
or slowing down of tempo (without pitch change) of the audio.  if you
twist the knob that makes the actress talk faster (Ms. Motormouth), it
shouldn't be upshifting her pitch to sound like Wendy or Bebe in South
Park.

r b-j

Reply by robert bristow-johnson ●August 23, 20072007-08-23

On Aug 22, 1:37 am, isw <i...@witzend.com> wrote:
>
> The fact that there is essentially no relation between these two
> entities -- i.e. the data stream is comprised of a sequence of
> descriptions of a series of still images -- is the reason why what you
> want to do is almost certainly impossible.
>
> If you really want to try, the first step will be to devise a method of
> recording video that does not quantize the temporal axis; i.e. not using
> a sequence of still images.

can't we think of the intensity (and chroma components) of a
particular point (x,y) of a still image as a sampled (at a rate of 30
Hz) value of a continuous-time signal that represents intensity at
that point?  i.e. we have I(x,y,t) being sampled as I(x,y,n*T).  and
then use some kinda interpolation to hypothetically reconstruct the
"still" images in between the sequence we are given?  i imagine there
would be some blurring, but if the resolution was very good to start
with, would that not work.  at least as a beginning point?

r b-j