DSPRelated.com
Forums

Different Resampling Rates for Segments of a Sound File

Started by jeffdod November 18, 2005
I am having a problem with an application I am developing and don't
know where to look for help. I thought I would try posing some
questions here.

I have an application where an analog sound source is recorded at a
*very* slow rate (about 1/4 of the intended playback speed). Also, the
machine playing the audio does not run at a regulated speed, which
introduces a certain amount of "wow" into the recording. However, I
also have a text file that contains timestamps that effectively divide
the total audio recording into many irregularly-sized chunks. Each
individual chunk needs to be resized (resampled?) so that it is exactly
1/23.976 seconds in duration. This is always a downsampling operation,
because the average size of the original segments is about 0.15 seconds
long.

Right now I have a program that reads up each of these chunks from an
audio file, figures out a real-number decimation rate to take that
particular segment down to 1/23.976 seconds, and I decimate it. Each
chunk is written out to a new sound file, so that all the resampled
chunks play back as one audio file.

My problem is that there are some unusual artifacts (aliasing?) in the
final product. I have tried many different ways of downsampling,
including the "secret rabbit code" library, the libresample library,
Microsoft DirectShow Editing Services, and my own custom decimation
routine. The all product strange artifacts.

Is there a way that anyone knows of to resample tiny segments of audio
with varying rates and product a smooth-sounding result?

- Jeff D

jeffdod wrote:
> > Is there a way that anyone knows of to resample tiny segments of audio > with varying rates and product a smooth-sounding result?
Secret Rabbit Code : http://www.mega-nerd.com/SRC/ allows time varying sample rates. I would be relatively easy to write a program that specifies the conversion ratio with each input sample and then interatively modify these conversion ratios to get the correction factor you want. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ GPLG GPLGPLGP GPLGPLGPLGP GPLGP GPL MICROSOFT GPLGP GPLGPLGPLGP GPLGPLGPL GPLGPL
Hello Erik,

That is what I did, using both the simple and fuller API's. However,
the data produced by doing this has strange artifacts in it--blips,
pops, and repetition of tiny audio segments over the top of the
"normal" sounding track. The data produced by the simple API was better
than the full API. I discovered the reason for this was that the
process calls for the full API would sometimes return fewer data
samples than I had requested for the output buffer. This would leave
gaps that sounded strange. The simple API would always return the
correct number of output samples. For instance, my input buffer might
have 9610 floats in it, and I would request 4004 floats as the output.
However, it would frequently return 3893 floats or some other value
instead.

Can you think of any reason why this might be? Any help is *greatly*
appreciated!

Jeff D.

Erik,

Although...let me make sure I understand what you saying. What I did
was to loop over each segment, make a separate process call for each
segment with a different SRC_DATA. Each time before calling the process
function I would put in the new input segment size and the desired
output segment size, as well as the ratio as
ratio=outputsize/inputsize. Is this what you mean by iteratively
changing the conversion ratio, or did I perhaps miss something? I did
wonder what the purpose was for the function that changes the
conversion ratio on the fly when you could program this in the SRC_DATA
struct anyway.

Jeff D.

jeffdod wrote:
> > Hello Erik, > > That is what I did, using both the simple and fuller API's.
The simple API simply is not capable of doing what you want.
> However, > the data produced by doing this has strange artifacts in it--blips, > pops, and repetition of tiny audio segments over the top of the > "normal" sounding track. The data produced by the simple API was better > than the full API.
Its highly likely that you were not useing either correctly.
> I discovered the reason for this was that the > process calls for the full API would sometimes return fewer data > samples than I had requested for the output buffer.
Its highly likely that you were not useing either correctly.
> This would leave > gaps that sounded strange. The simple API would always return the > correct number of output samples. For instance, my input buffer might > have 9610 floats in it, and I would request 4004 floats as the output. > However, it would frequently return 3893 floats or some other value > instead. > > Can you think of any reason why this might be? Any help is *greatly* > appreciated!
Its highly likely that you were not useing either correctly. I'll put together a demo program for you. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ Al Taqiyya http://www.ci-ce-ct.com/Feature%20articles/02-12-2002.asp http://www.geocities.com/bharatvarsha1947/January_2003/destroykafirs.htm
jeffdod wrote:
> > Erik, > > Although...let me make sure I understand what you saying. What I did > was to loop over each segment, make a separate process call for each > segment with a different SRC_DATA. Each time before calling the process > function I would put in the new input segment size and the desired > output segment size, as well as the ratio as > ratio=outputsize/inputsize. Is this what you mean by iteratively > changing the conversion ratio, or did I perhaps miss something? I did > wonder what the purpose was for the function that changes the > conversion ratio on the fly when you could program this in the SRC_DATA > struct anyway.
I'l put together a demo for you. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ Pastafarianism : http://www.venganza.org/ The intelligent alternative to 'Intelligent Design'.
Erik de Castro Lopo wrote:
> > > This would leave > > gaps that sounded strange. The simple API would always return the > > correct number of output samples. For instance, my input buffer might > > have 9610 floats in it, and I would request 4004 floats as the output. > > However, it would frequently return 3893 floats or some other value > > instead. > > > > Can you think of any reason why this might be? Any help is *greatly* > > appreciated! > > Its highly likely that you were not useing either correctly. > > I'll put together a demo program for you.
OK, here it is: http://www.mega-nerd.com/SRC/timewarp-file.c It uses libsndfile for file I/O and libsamplerate for the time warping. I leave compiling it on your OS as an exercise. The warping is done according to this array: typedef struct { sf_count_t index ; double ratio ; } TIMEWARP_FACTOR ; static TIMEWARP_FACTOR warp [] = { { 0 , 1.00000001 }, { 20000 , 1.01000000 }, { 20200 , 1.00000001 }, { 40000 , 1.20000000 }, { 40300 , 1.00000001 }, { 60000 , 1.10000000 }, { 60400 , 1.00000001 }, { 80000 , 1.50000000 }, { 81000 , 1.00000001 }, } ; The first entry in the array sets the starting conversion ratio to 1.00000001. All the input samples from index 0 to index 20000 will have that same conversion ratio. From input sample index 20000 to 202000, the conversion ratio jumps to 1.01 and then drops back down to 1.00000001. Now I'm just re-reading the original problem you posed:
> I have an application where an analog sound source is recorded at a > *very* slow rate (about 1/4 of the intended playback speed). Also, the > machine playing the audio does not run at a regulated speed, which > introduces a certain amount of "wow" into the recording. However, I > also have a text file that contains timestamps that effectively divide > the total audio recording into many irregularly-sized chunks. Each > individual chunk needs to be resized (resampled?) so that it is exactly > 1/23.976 seconds in duration. This is always a downsampling operation, > because the average size of the original segments is about 0.15 seconds > long.
I'm assuming that the audio segments that you have recorded are really supposed to be contiguous. If thats the case, you should concatenate all the segments to produce a single sound file. You will need to replace the array named warp above with index/ratio pairs that fix any time base changes. Generation of this warp array will probably be iterative if you are doing the correction by ear. HTH, Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ Turks embrace novelist's war on EU http://www.iht.com/articles/2005/10/12/news/novel.php
Erik de Castro Lopo wrote:
> > OK, here it is: > > http://www.mega-nerd.com/SRC/timewarp-file.c
One thing I would suggest is that you use theis program to modify a file containing 3 secoonds of a 440Hz sine wave. You should hear the modification quite clearly. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ "The phrase "object-oriented" means a lot of things. Half are obvious, and the other half are mistakes." -- Paul Graham
Erik wrote:
>> Its highly likely that you were not using either correctly. <<
Well, you are probably right! I greatly appreciate you putting together a demo program for me! I will check it out immediately. Gratefully, Jeff D.
Are the blips and pops at the segment boundaries?  If so, that's to be expected 
unless you take special care to handle these cases.  I would test your algorithm 
on a large chunk (many seconds) of audio to make sure it is clean apart from 
segment boundaries.  If indeed the boundaries are the problem, you will need to 
use audio from the previous and/or next segments in calculating the samples near 
the boundary.  Maybe that is what Erik is working on for you?

-- 
Jon Harris
SPAM blocker in place:
Remove 99 (but leave 7) to reply

"jeffdod" <jeffdod@netzero.net> wrote in message 
news:1132348958.109763.97130@g43g2000cwa.googlegroups.com...
> Hello Erik, > > That is what I did, using both the simple and fuller API's. However, > the data produced by doing this has strange artifacts in it--blips, > pops, and repetition of tiny audio segments over the top of the > "normal" sounding track. The data produced by the simple API was better > than the full API. I discovered the reason for this was that the > process calls for the full API would sometimes return fewer data > samples than I had requested for the output buffer. This would leave > gaps that sounded strange. The simple API would always return the > correct number of output samples. For instance, my input buffer might > have 9610 floats in it, and I would request 4004 floats as the output. > However, it would frequently return 3893 floats or some other value > instead. > > Can you think of any reason why this might be? Any help is *greatly* > appreciated! > > Jeff D. >