Forums

subtracting two audio samples, how?

Started by Bill Chiu March 13, 2004
Hi:

I've gave considerable thought into this but my math sucks..  I have
two sound samples, each 10-20 seconds of sound at something like 44Khz
- I need to subtract one from another, so I end up with a prototype
and a difference, where the difference should be very small, and
overall uses less storage.  Supposed both samples are extremely
similar to our human ears, and the beat of the samples are at quite
good matches.  This problem is very similar to the joint-stereo
compression used by mp3/ogg/etc, if any different at all, but I'm
stuck at differencing, here are the things I've tried:

1. I could do subtraction in time domain, but the result is more like
a union of the two samples rather then difference - the problem is the
phase is all different even though they sounded the same to our ear. 
The difference computed this way is useless to me because they are too
big.

2. I tried it in DCT domain - the problem is when I subtract the
coefficients of the two time series in this domain, the result
difference is still too large - afterall, I'm not dealing with
'stereo' difference, but rather similarity between samples that could
be off sync at multiple places with different - say - background
accompanyment.

Are there existing good algorithms for dealing with this?  E.g.
combine time-warping with DCT?

Bill

p.s. Is this the same problem as removing one audio signal from
another - like noise cancellation - but instead of creating opposite
waveform, we have two very similar waveforms that is most likely to
have different phases all over the place..  e.g. sample of same
performance recorded at different time..
Bill Chiu wrote:

> Hi: > > I've gave considerable thought into this but my math sucks.. I have > two sound samples, each 10-20 seconds of sound at something like 44Khz > - I need to subtract one from another, so I end up with a prototype > and a difference, where the difference should be very small, and > overall uses less storage. Supposed both samples are extremely > similar to our human ears, and the beat of the samples are at quite > good matches. This problem is very similar to the joint-stereo > compression used by mp3/ogg/etc, if any different at all, but I'm > stuck at differencing, here are the things I've tried: > > 1. I could do subtraction in time domain, but the result is more like > a union of the two samples rather then difference - the problem is the > phase is all different even though they sounded the same to our ear. > The difference computed this way is useless to me because they are too > big. > > 2. I tried it in DCT domain - the problem is when I subtract the > coefficients of the two time series in this domain, the result > difference is still too large - afterall, I'm not dealing with > 'stereo' difference, but rather similarity between samples that could > be off sync at multiple places with different - say - background > accompanyment. > > Are there existing good algorithms for dealing with this? E.g. > combine time-warping with DCT? > > Bill > > p.s. Is this the same problem as removing one audio signal from > another - like noise cancellation - but instead of creating opposite > waveform, we have two very similar waveforms that is most likely to > have different phases all over the place.. e.g. sample of same > performance recorded at different time..
There might be something you can do in certain very specific situations, but in general, you're out of luck. It's like trying to subtract two pictures of a stormy ocean to depict a flat calm. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Jerry Avins wrote:
> Bill Chiu wrote: > > I've gave considerable thought into this but my math sucks.. I have > > two sound samples, each 10-20 seconds of sound at something like 44Khz > > - I need to subtract one from another, so I end up with a prototype > > and a difference, where the difference should be very small, and > > overall uses less storage. Supposed both samples are extremely > > similar to our human ears, and the beat of the samples are at quite > > good matches. This problem is very similar to the joint-stereo > > compression used by mp3/ogg/etc, if any different at all, but I'm > > stuck at differencing, here are the things I've tried: > > > > 1. I could do subtraction in time domain, but the result is more like > > a union of the two samples rather then difference - the problem is the > > phase is all different even though they sounded the same to our ear. > > The difference computed this way is useless to me because they are too > > big.
...
> There might be something you can do in certain very specific situations, > but in general, you're out of luck. It's like trying to subtract two > pictures of a stormy ocean to depict a flat calm.
All need not be lost (nice analogy though Jerry!). You say the the two sound files are similar. Why do you know that? Here are some suggestions: ... the same sound recorded with two different microphones? ... the result of applying two different algorithms to the same original sound? ... two speakers saying the same thing? ... two bands playing the same song (or one band playing the same song twice)? Whatever, if you have some a priori knowledge about why the two sound files are similar, there is good chance that you can use that to your advantage. It would also help to know your final goal. Perhaps somebody has already done something similar. Regards, Andor PS: Noise reduction does not work by subtracting some noise signal. It is usually based on filtering and gating.
Andor,

The sound sources are closest to the example: "one band playing the
same song in two different concerts," where between the two samples,
the beat matches almost  exactly to 'trained ears', but to that same
pair of ears, there are clear but minor vocal, instrumental,
arrangement, and acoustic differences between the two recordings.

One of the thing I'm interested in achieving is to listen only to
these sonic differences without the full song also being heard.

More interestingly, I'm looking for a way to compress these very
similar two samples so to end up with one original (or mean) track,
and another differences track - and in such a way that with these two
tracks I can reproduce the original - lossy but 'high' quality; the
second (difference) track need to be made of mostly small values
(hence - compresses better then trying to compress the original
track(s))

I have tested an algorithm that can efficiently identify similar parts
of same/different audio samples e.g the chorus of a pop songs.  If
this differencing of similar samples problem can be solved, then we
can compress mp3/ogg more far more efficiently by collapsing repeated
parts.

Bill




an2or@mailcircuit.com (Andor) wrote in message news:<ce45f9ed.0403140104.7cd1e70@posting.google.com>...
> <snip> > All need not be lost (nice analogy though Jerry!). You say the the two > sound files are similar. Why do you know that? Here are some > suggestions: > > ... the same sound recorded with two different microphones? > ... the result of applying two different algorithms to the same > original sound? > ... two speakers saying the same thing? > ... two bands playing the same song (or one band playing the same song > twice)? > > Whatever, if you have some a priori knowledge about why the two sound > files are similar, there is good chance that you can use that to your > advantage. > > It would also help to know your final goal. Perhaps somebody has > already done something similar. > > Regards, > Andor > > PS: Noise reduction does not work by subtracting some noise signal. It > is usually based on filtering and gating.
Bill Chiu wrote:
> Andor, > > The sound sources are closest to the example: "one band playing the > same song in two different concerts," where between the two samples, > the beat matches almost exactly to 'trained ears', but to that same > pair of ears, there are clear but minor vocal, instrumental, > arrangement, and acoustic differences between the two recordings. > > One of the thing I'm interested in achieving is to listen only to > these sonic differences without the full song also being heard. > > More interestingly, I'm looking for a way to compress these very > similar two samples so to end up with one original (or mean) track, > and another differences track - and in such a way that with these two > tracks I can reproduce the original - lossy but 'high' quality; the > second (difference) track need to be made of mostly small values > (hence - compresses better then trying to compress the original > track(s)) > > I have tested an algorithm that can efficiently identify similar parts > of same/different audio samples e.g the chorus of a pop songs. If > this differencing of similar samples problem can be solved, then we > can compress mp3/ogg more far more efficiently by collapsing repeated > parts. > > Bill
Bill, Let me touch on just one of difficulties ahead of you: time differences. The sounds in the recordings consist of frequencies between, say, 30 and 10,000 cycles per second. These sounds are made by vibrating strings or air columns. A sound at 500 cycles is completely canceled by a replica of equal strength that is delayed by the time sound needs to travel a foot in air. The precise time that a sound starts relative to another (put differently, its phase) is not controllable by someone playing an instrument. Two guitars doubled on the same part sound like two guitars, not one louder one. If the situation were simple enough for addition (hence also subtraction) to work, they would sound like one guitar, just a bit louder. Our ears don't directly bring sound-pressure waves to our consciousness. Instead, we hear abstractions: pitches, timbres, phonemes, more. You want to analyze your recordings on the basis of the abstractions you hear. The simple techniques of DSP deal with waveforms. Abstraction is much harder. As far as I know, no program can "take dictation" (the dream of microphone to written score remains remote), yet even I, whose musical training ended nearly 60 years ago, can perceive as distinct four-part harmonies. You are asking for a lot! Jerry -- Engineering is the art of making what you want from things you can get. &#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
They are getting closer every day. I have heard some impressive demos of blind 
source separation, from a track will clearly differentiated voices (drums, bass, 
guitar, sort of thing), easily good enough to perform remixing.  This is all 
part of the global effort towards MPEG-7, which also demands sound 
classification tools. Monophonic trasncription is now very well established, at 
least for pitched sources.

The canonical MPEG-7 source distribution is available for anyone to download and 
explore:

http://www.lis.e-technik.tu-muenchen.de/research/bv/topics/mmdb/e_mpeg7.html


Put BSS together with beat detection and source classification, and full 
transcription of polyphonic sources (if distinct and not too dense in voices I 
guess) is close to doable. Ditto for separating speakers in the manner of the 
coktail party effect (needs spatial info, i.e. at least two microphones). Work 
is also intense on the much more difficult problem of homogeneous sources too, 
see e.g. Dafx-03 (http://www.elec.qmul.ac.uk/dafx03/).  It is certainly far from 
easy, but I would hesitate to say "remote" now, especially if you have some a 
priori information, such as a known number of voices.


Richard Dobson



Jerry Avins wrote:
  ...
> > Our ears don't directly bring sound-pressure waves to our consciousness. > Instead, we hear abstractions: pitches, timbres, phonemes, more. You > want to analyze your recordings on the basis of the abstractions you > hear. The simple techniques of DSP deal with waveforms. Abstraction is > much harder. As far as I know, no program can "take dictation" (the > dream of microphone to written score remains remote), yet even I, whose > musical training ended nearly 60 years ago, can perceive as distinct > four-part harmonies. You are asking for a lot! > > Jerry
OK: I retract "remote" as it applies to the industry generally. We agree 
on difficult, but some programs to do difficult things are available for 
purchase. On the time scale that interests the OP, "remote" might be a 
reasonable description if no one can suggest available software.

Jerry
--
Engineering is the art of making what you want from things you can get.
&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
Richard Dobson wrote:

> They are getting closer every day. I have heard some impressive demos of > blind source separation, from a track will clearly differentiated voices > (drums, bass, guitar, sort of thing), easily good enough to perform > remixing. This is all part of the global effort towards MPEG-7, which > also demands sound classification tools. Monophonic trasncription is now > very well established, at least for pitched sources. > > The canonical MPEG-7 source distribution is available for anyone to > download and explore: > > http://www.lis.e-technik.tu-muenchen.de/research/bv/topics/mmdb/e_mpeg7.html > > > > Put BSS together with beat detection and source classification, and full > transcription of polyphonic sources (if distinct and not too dense in > voices I guess) is close to doable. Ditto for separating speakers in the > manner of the coktail party effect (needs spatial info, i.e. at least > two microphones). Work is also intense on the much more difficult > problem of homogeneous sources too, see e.g. Dafx-03 > (http://www.elec.qmul.ac.uk/dafx03/). It is certainly far from easy, > but I would hesitate to say "remote" now, especially if you have some a > priori information, such as a known number of voices. > > > Richard Dobson > > > > Jerry Avins wrote: > ... > >> >> Our ears don't directly bring sound-pressure waves to our consciousness. >> Instead, we hear abstractions: pitches, timbres, phonemes, more. You >> want to analyze your recordings on the basis of the abstractions you >> hear. The simple techniques of DSP deal with waveforms. Abstraction is >> much harder. As far as I know, no program can "take dictation" (the >> dream of microphone to written score remains remote), yet even I, whose >> musical training ended nearly 60 years ago, can perceive as distinct >> four-part harmonies. You are asking for a lot! >> >> Jerry