Reply by Jon Harris April 6, 20052005-04-06
We discussed this in quite a bit of detail when this thread first started.  To
summarize, storing both minimum and maximum is the best way to do this as it
allows more accurate drawing of asymmetrical waveforms.  On the other hand,
storing a single value that is the maximum absolute value cuts your storage
requirements in half and still allows for drawing a decent envelope for most
typical audio waveforms.  The choice is up to you.

Based on your current implementation, you could of course cut your storage
requirement in half by truncating the data to 16-bit, which is more than enough
for display purposes.  Also, you might find it more efficient to store the
maximum and minimum values in adjacent locations rather than all the maximums
followed by all the minimums (i.e. interleave the maxs/mins).  That way, when
you go to draw a single block, you will be accessing 2 adjacent memory
locations, rather than 2 locations separated in memory by a large offset.  This
would be especially helpful if your data grew so large as to require virtual
memory.

-- 
Jon Harris
SPAM blocked e-mail address in use.  Replace the ANIMAL with 7 to reply.


<seijin@gmail.com> wrote in message
news:1112767024.512845.189000@o13g2000cwo.googlegroups.com...
> I've been working on this project and it's coming along rather nicely. > I've got it reading in samples and displaying the waveform. I'm rather > pleased about it so far. > > But here's what I'm doing - as an example I have a 24 minute audio file > (16 bit stereo) with a data section that is 276,456,960 bytes large. > Based on my current code of shrinking the file down to about 5mb in > memory that is roughly 111 samples per block. Currently I'm > allocating 5,000,000 twice. One of the 5mb is for storing a maximum > value for the block and the other is for storing a minimum. Regardless > of the original bit rate, I'm storing the samples as LONGs. Very > unoptimized but I wanted to see if I could get the wav graphed first. > > So my question - is that the proper thing to do? Do I need to store a > minimum and maximum value for each block? And then when graphing I > determine the number of blocks per pixel and graph the determined > minimum/maximum for that block based off of the original block's > min/max values. > > I feel like I'm doing something stupid here but I can't quite figure it > out. > > Otherwise - it's working great ^_^ Simple yet good. If storing > min/max values like I'm doing is the proper way, I won't mind it. I > just didn't know if I should be storing something else. Possibly a > single value of some kind. Once I get the basics figured out, I'll > work on storing the samples based on the original bitrate and other > things. Thanks for everything so far! >
Reply by seij...@gmail.com April 6, 20052005-04-06
I've been working on this project and it's coming along rather nicely.
I've got it reading in samples and displaying the waveform.  I'm rather
pleased about it so far.

But here's what I'm doing - as an example I have a 24 minute audio file
(16 bit stereo) with a data section that is 276,456,960 bytes large.
Based on my current code of shrinking the file down to about 5mb in
memory that is roughly 111 samples per block.   Currently I'm
allocating 5,000,000 twice.  One of the 5mb is for storing a maximum
value for the block and the other is for storing a minimum.  Regardless
of the original bit rate, I'm storing the samples as LONGs.  Very
unoptimized but I wanted to see if I could get the wav graphed first.

So my question - is that the proper thing to do?  Do I need to store a
minimum and maximum value for each block?  And then when graphing I
determine the number of blocks per pixel and graph the determined
minimum/maximum for that block based off of the original block's
min/max values.

I feel like I'm doing something stupid here but I can't quite figure it
out.

Otherwise - it's working great ^_^  Simple yet good.  If storing
min/max values like I'm doing is the proper way, I won't mind it.  I
just didn't know if I should be storing something else.  Possibly a
single value of some kind.  Once I get the basics figured out, I'll
work on storing the samples based on the original bitrate and other
things.  Thanks for everything so far!

Reply by Jon Harris March 22, 20052005-03-22
"robert bristow-johnson" <rbj@audioimagination.com> wrote in message
news:BE661188.5802%rbj@audioimagination.com...
> in article 3ablk8F4lfq80U1@individual.net, Jon Harris at > goldentully@hotmail.com wrote on 03/22/2005 17:48: > > > And keep in mind these types of "massively zoomed out" displays are > > usually just rough pictures of the envelope to help you identify major > > features. You wouldn't typically be trying to select something as > > fine as 1 screen pixel when zoomed out like that. > > yeah, you're probably right about that. i just like consistent and > predictable user interface (and DSP algorithms) and it would be an annoyance > to me if there was a particular zoom setting where it goes from "chunky" to > "pristine". i don't have Cool Edit (i don't think there is one for the Mac) > but Pro Tools, as i recall, had a set of zoom ratios and they were related > by powers of 2.
With normal audio, the switch is not noticeable at all. Only when I used sub-audible sine waves so that the envelope looked like a sine waveform was the switch detectable at all. For me, being restricted to power of 2 zooming would be a major limitation!
Reply by robert bristow-johnson March 22, 20052005-03-22
in article 3ablk8F4lfq80U1@individual.net, Jon Harris at
goldentully@hotmail.com wrote on 03/22/2005 17:48:

> And keep in mind these types of "massively zoomed out" displays are > usually just rough pictures of the envelope to help you identify major > features. You wouldn't typically be trying to select something as > fine as 1 screen pixel when zoomed out like that.
yeah, you're probably right about that. i just like consistent and predictable user interface (and DSP algorithms) and it would be an annoyance to me if there was a particular zoom setting where it goes from "chunky" to "pristine". i don't have Cool Edit (i don't think there is one for the Mac) but Pro Tools, as i recall, had a set of zoom ratios and they were related by powers of 2. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by Jon Harris March 22, 20052005-03-22
"robert bristow-johnson" <rbj@audioimagination.com> wrote in message
news:BE65D67E.57D3%rbj@audioimagination.com...
> in article 3ab7k9F68uqeeU1@individual.net, Jon Harris at > goldentully@hotmail.com wrote on 03/22/2005 13:49: > > > Also, using the 128-sample block example, you don't necessarily need to > > restrict > > zooming out to be in 128-sample intervals. The peak file could be > > interpolated > > as necessary to create arbitrary views. > > i think that's bad. each pixel maps to a contiguous segment of samples and > you want the max value and the min value for that segment. the peak file > will not have the information of exactly where the max and min where nor if > there was another peak that almost hit the max (but lost the contest with > the real max). that other peak (which ain't in the peak file) might become > the true peak for a remapped segment of audio if you choose the zoom ratio > to any arbitrary view unless the new view is a multiple of 128/pixel (of > which you can figger out the max and min from the peak file). > > > I think simple "nearest-neighbor" > > (zero-order) interpolation would be sufficient for this crude display-only > > waveform. > > i don't think it would look so good.
Well, I know that CoolEdit and others do allow arbitrary zoom settings, and the resulting displays look just fine. But I don't know exactly how they do it. To get around some of the problems, when interpolating between peak values, perhaps the largest one should be used. That will at least make it so you never show a peak value smaller than it actually is. And keep in mind these types of "massively zoomed out" displays are usually just rough pictures of the envelope to help you identify major features. You wouldn't typically be trying to select something as fine as 1 screen pixel when zoomed out like that. I just did a quick experiment with some really low-frequency sine waves (1-5 Hz) in CoolEdit, and it does look a bit "chunky" when the peak file is being used. You can see when it switches to using the real audio data as the waveform then becomes pristine.
Reply by Jon Harris March 22, 20052005-03-22
"Ben Bradley" <ben_nospam_bradley@frontiernet.net> wrote in message
news:t8r041dr1tcpk8stsk3con5shr3h7mktbh@4ax.com...
> On Tue, 22 Mar 2005 10:49:44 -0800, "Jon Harris" > <goldentully@hotmail.com> wrote: > > > > >One other subtlety to mention, there are 2 different ways I've seen of
dealing
> >with the bi-polar nature of audio. You could store a single maximum absolute > >value for each block, and then display the waveform as either a simple > >positive-only envelope or as a symmetrical bi-polar waveform with the
absolute
> >maximum value used for both the positive and negative value. A second way
would
> >be to store the maximum and minimum (most negative) values for each block,
and
> >then draw the bi-polar waveform from that. The first way cuts the storage > >requirements in half, while the second way is a bit more accurate. > > I think it's important to do the second way, as many sounds > (especially the most common things recorded, voice and musical > instruments) are asymetrical, and if you do only one half of the > waveform, it could be heavily clipped on the other half and you > wouldn't know it.
To do it right, you would store the greater of the positive and negative peaks (i.e. the max of the absolute value). Then you would never have clipping that doesn't show up. But I agree the second method is superior since it provides more information than the first.
Reply by Ben Bradley March 22, 20052005-03-22
On Tue, 22 Mar 2005 10:49:44 -0800, "Jon Harris"
<goldentully@hotmail.com> wrote:



>One other subtlety to mention, there are 2 different ways I've seen of dealing >with the bi-polar nature of audio. You could store a single maximum absolute >value for each block, and then display the waveform as either a simple >positive-only envelope or as a symmetrical bi-polar waveform with the absolute >maximum value used for both the positive and negative value. A second way would >be to store the maximum and minimum (most negative) values for each block, and >then draw the bi-polar waveform from that. The first way cuts the storage >requirements in half, while the second way is a bit more accurate.
I think it's important to do the second way, as many sounds (especially the most common things recorded, voice and musical instruments) are asymetrical, and if you do only one half of the waveform, it could be heavily clipped on the other half and you wouldn't know it.
>Also, using the 128-sample block example, you don't necessarily need to restrict >zooming out to be in 128-sample intervals. The peak file could be interpolated >as necessary to create arbitrary views. I think simple "nearest-neighbor" >(zero-order) interpolation would be sufficient for this crude display-only >waveform.
I agree and I'd think this is how they do it. ----- http://mindspring.com/~benbradley
Reply by Ben Bradley March 22, 20052005-03-22
On 21 Mar 2005 16:14:07 -0800, "seijin@gmail.com" <seijin@gmail.com>
wrote:

>Actually, it does make sense. But if zooming in, I would have to >reanalyze the whole file, wouldn't I? Because the user would want the >ability to scroll through the whole file at a fine detail. So maybe >they zoom in far enough where 1 pixel is equal to 50 samples - wouldn't >I need to reanalyze the whole file by grabbing 50 samples, finding the >min & max and then plotting? > >I can see that zooming out wouldn't be a problem since you're just >pretending that 128 samples is really 1 sample. So they zoom out once >and now 256 samples is equal to 1 sample. And instead of re-reading > >the whole file to get the min and max of 256 samples you'd just get the >minimum and maximum of the first two "blocks" of 128 samples, right? >And then it should be fine zooming back in as long as they only zoom >into a detail of 128 samples/pixel as that should be loaded into memory >at that detail. > >Am I on the same level?
Yes, this appears to be how most audio editors work, generating some sort of file that's much smaller than the original, but that effectively has the envelope, and this peak file is used for fast displays of larger portions of the file. Cool Edit and N-Track Studio (and probably Goldwave, but I don't remember) do all their scanning and generation of the peak file when a file is opened or as it's recorded, and leave the file with the name of the .wav file but with something like a .pk extension. Also, cdwave(.com) does this scanning, but apparently only uses one file for peak data. If you reload the last file it's much faster, but if you load a different file and then the first file, it takes the 'regular' slow time to scan. If I were writing something like that it would be tempting to decode one of the file formats and use it, but there might be some legal problems with doing that. This sort of thing is probably well-documented somewhere, perhaps harmony-central.com. Ask the guy at cdwave.com how he does it (even though there are only two display sizes). I recall a poster here (comp.dsp) years ago saying he was writing a unix/linux audio editor, maybe you could hunt him down and ask him. ----- http://mindspring.com/~benbradley
Reply by robert bristow-johnson March 22, 20052005-03-22
in article 3ab7k9F68uqeeU1@individual.net, Jon Harris at
goldentully@hotmail.com wrote on 03/22/2005 13:49:

> One other subtlety to mention, there are 2 different ways I've seen of dealing > with the bi-polar nature of audio. You could store a single maximum absolute > value for each block, and then display the waveform as either a simple > positive-only envelope or as a symmetrical bi-polar waveform with the absolute > maximum value used for both the positive and negative value. A second way > would > be to store the maximum and minimum (most negative) values for each block, and > then draw the bi-polar waveform from that. The first way cuts the storage > requirements in half, while the second way is a bit more accurate.
i like the second way. for each pixel, you draw a vertical line from the max value to the min value (for the entire block corresponding to that pixel). it works all the way down to 1 sample/pixel. very consistent in behavior.
> Also, using the 128-sample block example, you don't necessarily need to > restrict > zooming out to be in 128-sample intervals. The peak file could be > interpolated > as necessary to create arbitrary views.
i think that's bad. each pixel maps to a contiguous segment of samples and you want the max value and the min value for that segment. the peak file will not have the information of exactly where the max and min where nor if there was another peak that almost hit the max (but lost the contest with the real max). that other peak (which ain't in the peak file) might become the true peak for a remapped segment of audio if you choose the zoom ratio to any arbitrary view unless the new view is a multiple of 128/pixel (of which you can figger out the max and min from the peak file).
> I think simple "nearest-neighbor" > (zero-order) interpolation would be sufficient for this crude display-only > waveform.
i don't think it would look so good. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by Jon Harris March 22, 20052005-03-22
CoolEdit uses this general technique, and they call them peak files (.pk
extension), so that is probably where the terminology came from (not Peak Audio
who make CobraNet).

It sounds like you guys already have this pretty well nailed down, but to
summarize, the basic idea is that the peak file is a down-sampled version of the
original audio, but the downsampling process takes the maximum absolute value of
each block of say 128 samples, rather than using traditional
filtering/decimation process.  The peak file is used to draw the waveform when
you are zoomed out, and when you are zoomed in, you only need a small portion of
the audio file, so you use the actual audio data.

The peak file is created either during recording, or if you are opening an
existing file, when the file is opened.  CoolEdit will save the peak files along
with the original .wav file so that on subsequent opens, it need not be
recalculated.  They must somehow deal with the problem of keeping the peak file
in sync with the wave file when you copy/paste, change volume, etc..  A brute
force method would be to simplify rescan the whole file after every change and
recreate the peak file anew--simple, fool-proof, but very inefficient especially
with large files.  Smarter algorithms that only update the portion affected are
probably used, and the rescanning is probably build into the processing routines
so the audio data only needs to be accessed once.

One other subtlety to mention, there are 2 different ways I've seen of dealing
with the bi-polar nature of audio.  You could store a single maximum absolute
value for each block, and then display the waveform as either a simple
positive-only envelope or as a symmetrical bi-polar waveform with the absolute
maximum value used for both the positive and negative value.  A second way would
be to store the maximum and minimum (most negative) values for each block, and
then draw the bi-polar waveform from that.  The first way cuts the storage
requirements in half, while the second way is a bit more accurate.

Also, using the 128-sample block example, you don't necessarily need to restrict
zooming out to be in 128-sample intervals.  The peak file could be interpolated
as necessary to create arbitrary views.  I think simple "nearest-neighbor"
(zero-order) interpolation would be sufficient for this crude display-only
waveform.

"robert bristow-johnson" <rbj@audioimagination.com> wrote in message
news:BE64DA98.576D%rbj@audioimagination.com...
> in article 1111450446.938466.45180@f14g2000cwb.googlegroups.com, > seijin@gmail.com at seijin@gmail.com wrote on 03/21/2005 19:14: > > > Actually, it does make sense. > ... > > I can see that zooming out wouldn't be a problem since you're just > > pretending that 128 samples is really 1 sample. So they zoom out once > > and now 256 samples is equal to 1 sample. And instead of re-reading > > the whole file to get the min and max of 256 samples you'd just get the > > minimum and maximum of the first two "blocks" of 128 samples, right? > > exactly. and as long as your wider zoom ratio is a multiple of 128 samples > per pixel, you need not look at the audio file at all. just get your min > and max from the "peak file". > > > And then it should be fine zooming back in as long as they only zoom > > into a detail of 128 samples/pixel as that should be loaded into memory > > at that detail. > > yeah, i guess 4 meg (for 30 minutes of audio) isn't too bad to load into > memory. > > > Am I on the same level? > > i think so. > > ... > > > But if zooming in, I would have to reanalyze the whole file, wouldn't I? > > no, i don't so. > > > Because the user would want the > > ability to scroll through the whole file at a fine detail. So maybe > > they zoom in far enough where 1 pixel is equal to 50 samples - wouldn't > > I need to reanalyze the whole file by grabbing 50 samples, finding the > > min & max and then plotting? > > but i don't see why you think you need to do that for the *whole* audio > file. as the user presses the scroll left or scroll right arrows, the > display is moved some amount to the right or left (respectively) with some > of it "falling off the edge" and there is this hole in the display you have > to fill in. only the audio for that hole needs be analyzed for a min and > max per pixel. not the whole audio file. > > -- > > r b-j rbj@audioimagination.com > > "Imagination is more important than knowledge." > >