Forums

Higher upsampling with minimum phase downsampling produces more aliasing

Started by jungledmnc July 4, 2014
Actually I cannot agree with a few things.

- 63 harmonics is waaaaay not enough - I actually have a harmonic based
generator, which uses 256 harmonics and can convert from/to "shape". So for
example if I draw a "perfect" sawtooth and convert it to 256 harmonics and
synthesize it back it sounds different. Not very, but it is clearly missing
the highest harmonics if the pitch is low enough.

- 1024 samples wavetable is not good enough. You kinda got me experimenting
:), so I was doing some measurements. With linear interpolation even 2048
samples was doing much more distortion than 8096. It was probably beyond
hearing limits (like -120dB or something), but with some postprocessing it
can easily get audible. However it got much better with cubic
interpolation. Anyway I wasn't doing hearing tests, so I cannot say really,
but it was very easy to measure the difference - there were just weird
inharmonic peaks (even below fundamental). Not sure if it was aliasing of
something, oversampling did help, but it was much more effective to use
cubic interpolation than oversampling it.


When it comes to oversampling, then I cannot really afford too much. Say 4x
oversampling is reasonable, 512x absolutely not. Generating the upsampled
points isn't such a big deal, but the filtering is. And it must be
zero-latency, because it is realtime.


The "arithmetics" - I think I understand it now. So the idea is that when
we limit our hearing to 20k and have Nyquist at say 24k, then there is 8k
space, which we can fill with any mess, including alias and we are ok. So
we can synthesize any pitches where the highest harmonic fits interval
20k-28k. Correct?

Personally I'm not sure about the theory, that we can totally ignore
everything above 20k, after all there are headphones capable of reproducing
30kHz. Though I have never seen a human, who would hear above 18k and my
personal limit is about 17k too. So let's say so.

But there's a problem - the typical sampling rate is still 44100, and with
the nyquist approaching 20k the number of required bandlimited wavetables
grows exponentially. According to your formula this is about 4 per octave,
feasible, but quite a lot. But I'm definitely going to try some experiments
and get back here with the results.


I think I won't need the paper, I wouldn't have time to study it I'm
afraid, my computer is full of documents to study for years already :D. I'm
not in music-dsp mailing list, I'll check it out.

jungledmnc	 

_____________________________		
Posted through www.DSPRelated.com
On 7/9/14 5:24 PM, jungledmnc wrote:
> Actually I cannot agree with a few things. > > - 63 harmonics is waaaaay not enough - I actually have a harmonic based > generator, which uses 256 harmonics and can convert from/to "shape". So for > example if I draw a "perfect" sawtooth and convert it to 256 harmonics and > synthesize it back it sounds different. Not very, but it is clearly missing > the highest harmonics if the pitch is low enough.
what frequencies are those higher harmonics at? think about it, middle C and the 256th harmonic. then find yourself a good analog synth with a sawtooth you think is "perfect", and play that back through a brick-wall filter (if you can get something that approximates it) set to 20 kHz. and tell me then if you hear the difference. if you say you can, then it's time for blind testing. you see, we don't hear "perfect" sawtooths. we hear the portion of those perfect sawtooths that fall within our range of hearing.
> > - 1024 samples wavetable is not good enough. You kinda got me experimenting > :), so I was doing some measurements. With linear interpolation even 2048 > samples was doing much more distortion than 8192.
what's the highest non-zero harmonic?
> It was probably beyond > hearing limits (like -120dB or something), but with some postprocessing it > can easily get audible.
be specific. what post processing?
> However it got much better with cubic interpolation.
which cubic interpolation? Lagrange? Hermite? B-spline? what is the oversampling ratio? (which is half the wavetable size divided by the index of the highest non-zero harmonic. and how loud *is* that harmonic?)
> Anyway I wasn't doing hearing tests, so I cannot say really, > but it was very easy to measure the difference
oh. that explains it. :-\
> - there were just weird > inharmonic peaks (even below fundamental). Not sure if it was aliasing of > something, oversampling did help, but it was much more effective to use > cubic interpolation than oversampling it. >
keep experimenting.
> > When it comes to oversampling, then I cannot really afford too much. Say 4x > oversampling is reasonable, 512x absolutely not.
you missed the point. the oversampling is accomplished by having a wavetable of N points with very little or *no* energy in the harmonics with indices anywhere close to N/2.
> Generating the upsampled > points isn't such a big deal, but the filtering is.
> And it must be zero-latency, because it is realtime. >
so a 32-sample delay (from using 16-sample double buffering) is gonna make it not realtime? "realtime" does not mean the same as "live". but even "live" can handle a little delay. you get 44 samples of delay for every foot you stand away from your amp or monitor speakers.
> > The "arithmetics" - I think I understand it now. So the idea is that when > we limit our hearing to 20k and have Nyquist at say 24k, then there is 8k > space, which we can fill with any mess, including alias and we are ok.
it's a 4 kHz space.
> So we can synthesize any pitches where the highest harmonic fits interval > 20k-28k. Correct? > > Personally I'm not sure about the theory, that we can totally ignore > everything above 20k, after all there are headphones capable of reproducing > 30kHz.
better do some blind testing. with false negatives to keep the test subjects honest. but they've done it before.
> Though I have never seen a human, who would hear above 18k and my > personal limit is about 17k too. So let's say so.
then, there you go. you're not hearing harmonics above 18 kHz, you wouldn't know if they're missing above 18 kHz, nor if they're aliased and remain above 18 kHz.
> > But there's a problem - the typical sampling rate is still 44100, and with > the nyquist approaching 20k the number of required bandlimited wavetables > grows exponentially. According to your formula this is about 4 per octave, > feasible, but quite a lot.
bump it down to 18 kHz, then you're back to 2.
> But I'm definitely going to try some experiments > and get back here with the results. > > > I think I won't need the paper, I wouldn't have time to study it I'm > afraid, my computer is full of documents to study for years already :D. I'm > not in music-dsp mailing list, I'll check it out.
highly recommended. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
>> - 63 harmonics is waaaaay not enough - I actually have a harmonic based >> generator, which uses 256 harmonics and can convert from/to "shape". So
for
>> example if I draw a "perfect" sawtooth and convert it to 256 harmonics
and
>> synthesize it back it sounds different. Not very, but it is clearly
missing
>> the highest harmonics if the pitch is low enough. > >what frequencies are those higher harmonics at? > >think about it, middle C and the 256th harmonic. > >then find yourself a good analog synth with a sawtooth you think is >"perfect", and play that back through a brick-wall filter (if you can >get something that approximates it) set to 20 kHz. and tell me then if >you hear the difference. if you say you can, then it's time for blind >testing. > >you see, we don't hear "perfect" sawtooths. we hear the portion of >those perfect sawtooths that fall within our range of hearing.
Yes, but with the pitch of say 70Hz, a nice dubstep subbass :), you are at 70 * 256 = almost 18k, which is still audible. And for rendering drums or even "bassier" tones, we can easily get to 40-50Hz. So even 256 harmonics are not enough to represent all harmonics.
>> - 1024 samples wavetable is not good enough. You kinda got me
experimenting
>> :), so I was doing some measurements. With linear interpolation even
2048
>> samples was doing much more distortion than 8192. > >what's the highest non-zero harmonic?
That depends of course, for sawtooth it is unlimited, right?
>> It was probably beyond >> hearing limits (like -120dB or something), but with some postprocessing
it
>> can easily get audible. > >be specific. what post processing?
It can be absolutely anything. Different kinds of distortion, filtering, level compression... It's really easy to amplify the "dirt" in the signal. Many do it intentionally.
>> However it got much better with cubic interpolation. > >which cubic interpolation? Lagrange? Hermite? B-spline?
I think it is hermite.
>what is the oversampling ratio? (which is half the wavetable size >divided by the index of the highest non-zero harmonic. and how loud >*is* that harmonic?)
I use 8192 point wavetables, which may either be rendered using 256 harmonics, or directly by "shape", so it can be a "perfect" sawtooth, with all harmonics. We could say that I'm going to accept minimum pitch of say 40Hz. With 20k high limit, there needs to be 500 harmonics then. Anyway some more tests: Sawtooth wave, 171Hz (C4), 44kHz sampling rate, analysed using FFT 65536 points, hann window, checking up to -150dB: 1) bandlimited (1 per oct), 3x oversampling (to exceed 96k), downsampling using minphase 72dB/oct cubic interpolation => crystal clear linear interpolation => alias (probably), e.g. 670Hz, -110dB 2) bandlimited (1 per oct), no oversampling cubic interpolation => clear, allias can be measured as the residue from the top octave in the bandlimited wavetable, but I don't hear a difference linear interpolation => same as cubic, but an additional line of aliased frequencies, e.g. again 670Hz, -110dB 3) no bandlimit cubic & linear interpolation cannot be really distinguished in all that alias, e.g. 150Hz, -50dB So the linear interpolation really isn't enough even for big wavetables. With 2048 point wavetable and cubic interpolation with 3x oversampling, the results were similar to when linear-interpolation was used, but it was even worse.
>> When it comes to oversampling, then I cannot really afford too much. Say
4x
>> oversampling is reasonable, 512x absolutely not. > >you missed the point. the oversampling is accomplished by having a >wavetable of N points with very little or *no* energy in the harmonics >with indices anywhere close to N/2.
Aaaaah ok! Anyway the band limiting with 8192 samples (even if there are just say 20 harmonics!) seems working very well. But still, if I decrease the wavetable size,
>> And it must be zero-latency, because it is realtime. >> > >so a 32-sample delay (from using 16-sample double buffering) is gonna >make it not realtime? > >"realtime" does not mean the same as "live". but even "live" can handle >a little delay. you get 44 samples of delay for every foot you stand >away from your amp or monitor speakers.
I cannot fully agree. 44 samples is 1 millisecond and every millisecond is relevant. Of course, we can live with it, but if we can avoid it... The minimum phase filter should be fine with it. Btw. where did you get the "32-sample" delay get from? You mean linear-phase filtering with 32 point FIR? I don't know, but I'd say it can hardly be that steep with such a few points, or could it? I'm generally using 512 points or even 2048 points if I need linear-phase.
>> >> The "arithmetics" - I think I understand it now. So the idea is that
when
>> we limit our hearing to 20k and have Nyquist at say 24k, then there is
8k
>> space, which we can fill with any mess, including alias and we are ok. > >it's a 4 kHz space.
Hmmm, then I'm missing something - let's say I create a wavetable for generating pitches from 100Hz and the limit 20k, sampling rate 48k, so Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then the highest harmonic will be at 28k, which will alias from 24k to 20k, right? So where am I wrong?
>> Personally I'm not sure about the theory, that we can totally ignore >> everything above 20k, after all there are headphones capable of
reproducing
>> 30kHz. > >better do some blind testing. with false negatives to keep the test >subjects honest.
That's a really hard thing to do, because many people are "trained". There are even apps to train the ears, online. And it also depends on the situation - e.g. if you are in a noise environment, the high frequency resolution gets lower. If you don't sleep well, same thing. So the listening tests are good as a "guide", but saying "we can easily create anything above 20k" just because me and nobody around me listens it isn't such a good idea.
>bump it down to 18 kHz, then you're back to 2.
Not really a good idea, some people just have better ears, and they can measure it... some really do. Simply put, I cannot judge ears for other people. It's even possible our hearing will get better in the future with meds and stuff. And imagine someone makes some music, that will be "ugly" in the future, because suddenly people will hear above 20k. I know it's a little extreme attitude, but still... Though your idea with having more band limited wavetables is probably the only good solution. Though even with 1 table / oct it looks good now. jungledmnc _____________________________ Posted through www.DSPRelated.com
On Thu, 10 Jul 2014 09:15:38 -0500, "jungledmnc"
<34728@dsprelated> wrote:

<snip>
>Hmmm, then I'm missing something - let's say I create a wavetable for >generating pitches from 100Hz and the limit 20k, sampling rate 48k, so >Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then >the highest harmonic will be at 28k, which will alias from 24k to 20k, >right? So where am I wrong?
I may be missing something as well, but doing this as a (simple-minded) thought experiment I imagine a table holding one cycle of a ramp wave. You change the output frequency by changing the step size through the table, while keeping the step rate fixed at the sample rate. Since this is a linear ramp, you can use simple linear interpolation for steps that land between table value. As you increase the step size, you traverse around the table faster, but the wave shape doesn't change... it still ramps up to maximum and abruptly drops to minimum and repeats. So if the anti-alias filter is happy with that abrupt drop (and resultant spectral splatter) at low output frequencies, wouldn't it be happy at higher frequencies as well? Best regards, Bob Masta DAQARTA v7.60 Data AcQuisition And Real-Time Analysis www.daqarta.com Scope, Spectrum, Spectrogram, Sound Level Meter Frequency Counter, Pitch Track, Pitch-to-MIDI FREE Signal Generator, DaqMusiq generator Science with your sound card!
On 7/12/14 8:26 AM, Bob Masta wrote:
> On Thu, 10 Jul 2014 09:15:38 -0500, "jungledmnc" > <34728@dsprelated> wrote: > > <snip> >> Hmmm, then I'm missing something - let's say I create a wavetable for >> generating pitches from 100Hz and the limit 20k, sampling rate 48k, so >> Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then >> the highest harmonic will be at 28k, which will alias from 24k to 20k, >> right? So where am I wrong? > > I may be missing something as well, but doing this as a > (simple-minded) thought experiment I imagine a table holding > one cycle of a ramp wave.
it *should* be a bandlimited ramp wave.
> You change the output frequency > by changing the step size through the table, while keeping > the step rate fixed at the sample rate. > > Since this is a linear ramp,
but it isn't exactly that. it is an approximation of the linear ramp with a finite number of non-zero harmonics. start with an actual linear ramp (with harmonics that decay as 1/k) and zero the coefficient of all harmonics above the Kth harmonic (i have changed the notation from the "Nth" harmonic, because "N" is now gonna be the FFT length). so all FFT bins between K and N-K are set to zero, then inverse FFT. that's the waveform. now at lower pitches, when K is a larger value, the waveform ramp will look more linear. but at higher pitches K is smaller, there are fewer harmonics, and the waveform will look a little sloppier. in all cases the edge is a little bit sloppy compared to a perfect analog (more precisely "continuous-time") waveform. but if you pass that perfect analog waveform through a perfect analog brick-wall filter set to, say 20 kHz (or for deafies like me, even a little lower), no one will hear it differently. our hearing is more sophisticated than a simple linear Fourier analysis machine, but we *do* have finite hearing range. there is *some* limit such that if all frequencies above that limit are removed, we don't hear it. again, if someone claims that they hear the difference between the sawtooth with harmonics that go to say 30 kHz, and another identical sawtooth with the same harmonics up to, say, 19 kHz, if they claim that, there are methods of blind testing, complete with false negative and false positives (to keep us all honest).
> you can use simple linear > interpolation for steps that land between table value.
even with a little curvature, linear interpolation works quite well when the Nyquist frequency is much much higher than the highest harmonic. we have come up with mathematical expressions that compute the entire energy of the images, that in worst case, can all fold back into the baseband.
> As > you increase the step size, you traverse around the table > faster, but the wave shape doesn't change... it still ramps > up to maximum and abruptly drops to minimum and repeats.
how abrupt depends on the number of non-zero harmonics and the step size (what i like to sometimes call the "stride"). but for higher pitches, there are fewer non-zero harmonics, but the stride is also larger, so it's about the same for lower or higher pitches.
> So if the anti-alias filter is happy with that abrupt drop > (and resultant spectral splatter) at low output frequencies, > wouldn't it be happy at higher frequencies as well?
if you set it up correctly (using different wavetables for lower pitches than for higher pitches), it comes out just as well for lower or higher pitches. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
> Hmmm, then I'm missing something - let's say I create a wavetable for > generating pitches from 100Hz and the limit 20k, sampling rate 48k, so > Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then > the highest harmonic will be at 28k, which will alias from 24k to 20k, > right? So where am I wrong?
Just to clarify - could you check this r-b-j? I just don't see where am I wrong. _____________________________ Posted through www.DSPRelated.com
On Saturday, July 12, 2014 8:26:39 AM UTC-4, Bob Masta wrote:
> On Thu, 10 Jul 2014 09:15:38 -0500, "jungledmnc" > > <34728@dsprelated> wrote: > > > > <snip> > > >Hmmm, then I'm missing something - let's say I create a wavetable for > > >generating pitches from 100Hz and the limit 20k, sampling rate 48k, so > > >Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then > > >the highest harmonic will be at 28k, which will alias from 24k to 20k, > > >right? So where am I wrong? > > > > I may be missing something as well, but doing this as a > > (simple-minded) thought experiment I imagine a table holding > > one cycle of a ramp wave. You change the output frequency > > by changing the step size through the table, while keeping > > the step rate fixed at the sample rate. > > > > Since this is a linear ramp, you can use simple linear > > interpolation for steps that land between table value. As > > you increase the step size, you traverse around the table > > faster, but the wave shape doesn't change... it still ramps > > up to maximum and abruptly drops to minimum and repeats.
I think I read somewhere of using a virtual table index i.e. one that is larger than your actual table, so that it reduces the quantization error in the index that can build up over time. There was also a technique of using 2 table. The first one has course granularity and the 2nd one has finer granularity but only between 1 step of the first table. Then by using trigonometric identities you can calculate cos(A+B) where A comes from the first table and B comes from the 2nd table. Cheers, Dave
On 7/16/14 4:17 AM, jungledmnc wrote:
>> Hmmm, then I'm missing something - let's say I create a wavetable for >> generating pitches from 100Hz and the limit 20k, sampling rate 48k, so >> Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then >> the highest harmonic will be at 28k, which will alias from 24k to 20k, >> right? So where am I wrong? > > Just to clarify - could you check this r-b-j? I just don't see where am I > wrong.
nothing wrong. 200 harmonics in either case. with high-quality interpolation (better than linear), you would need a minimum of 401 samples in the wavetable. note that 140 Hz is about 1/2 octave above 100 Hz. *semantically* i would still say that the guard band is 4 kHz, not 8. when you're sampling at 48 kHz and your Nyquist is at 24 kHz, there really *is* no 28 kHz. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Aaaah, great, got it! Thank you!

jungledmnc	 

_____________________________		
Posted through www.DSPRelated.com
On 7/16/14 9:04 AM, Dave wrote:
> > > I think I read somewhere of using a virtual table index i.e. one that is larger than your actual table, so that it reduces the quantization error in the index that can build up over time. > > There was also a technique of using 2 table. The first one has course granularity and the 2nd one has finer granularity but only between 1 step of the first table. Then by using trigonometric identities you can calculate cos(A+B) where A comes from the first table and B comes from the 2nd table.
this would work for a single sine wave in the wavetable. i think i came across this before in the context of non-musical use (like it was an "NCO" or "DDS" or whatever they're calling it nowadaze). you can have a sin and cos table (it can be the same table) for "t" and then another sin and cos table for "delta_t" and do sin(t + delta_t) = cos(t)*sin(delta_t) + sin(t)*cos(delta_t) where delta_t is smaller than the difference between adjacent t values in the first table. for a general waveshape and some finite-order polynomial interpolation, you can have one table for each power of the "delta_t" portion and add up the power series real fast. in the case of 1st-order power series (a.k.a. linear interpolation), you can easily eliminate that second table with a mere subtraction. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."