comp.dsp | Higher upsampling with minimum phase downsampling produces more aliasing| page 2

Reply by jungledmnc ●July 9, 20142014-07-09

Actually I cannot agree with a few things.

- 63 harmonics is waaaaay not enough - I actually have a harmonic based
generator, which uses 256 harmonics and can convert from/to "shape". So for
example if I draw a "perfect" sawtooth and convert it to 256 harmonics and
synthesize it back it sounds different. Not very, but it is clearly missing
the highest harmonics if the pitch is low enough.

- 1024 samples wavetable is not good enough. You kinda got me experimenting
:), so I was doing some measurements. With linear interpolation even 2048
samples was doing much more distortion than 8096. It was probably beyond
hearing limits (like -120dB or something), but with some postprocessing it
can easily get audible. However it got much better with cubic
interpolation. Anyway I wasn't doing hearing tests, so I cannot say really,
but it was very easy to measure the difference - there were just weird
inharmonic peaks (even below fundamental). Not sure if it was aliasing of
something, oversampling did help, but it was much more effective to use
cubic interpolation than oversampling it.


When it comes to oversampling, then I cannot really afford too much. Say 4x
oversampling is reasonable, 512x absolutely not. Generating the upsampled
points isn't such a big deal, but the filtering is. And it must be
zero-latency, because it is realtime.


The "arithmetics" - I think I understand it now. So the idea is that when
we limit our hearing to 20k and have Nyquist at say 24k, then there is 8k
space, which we can fill with any mess, including alias and we are ok. So
we can synthesize any pitches where the highest harmonic fits interval
20k-28k. Correct?

Personally I'm not sure about the theory, that we can totally ignore
everything above 20k, after all there are headphones capable of reproducing
30kHz. Though I have never seen a human, who would hear above 18k and my
personal limit is about 17k too. So let's say so.

But there's a problem - the typical sampling rate is still 44100, and with
the nyquist approaching 20k the number of required bandlimited wavetables
grows exponentially. According to your formula this is about 4 per octave,
feasible, but quite a lot. But I'm definitely going to try some experiments
and get back here with the results.


I think I won't need the paper, I wouldn't have time to study it I'm
afraid, my computer is full of documents to study for years already :D. I'm
not in music-dsp mailing list, I'll check it out.

jungledmnc	 

_____________________________		
Posted through www.DSPRelated.com

Reply by robert bristow-johnson ●July 9, 20142014-07-09

On 7/9/14 5:24 PM, jungledmnc wrote:
> Actually I cannot agree with a few things.
>
> - 63 harmonics is waaaaay not enough - I actually have a harmonic based
> generator, which uses 256 harmonics and can convert from/to "shape". So for
> example if I draw a "perfect" sawtooth and convert it to 256 harmonics and
> synthesize it back it sounds different. Not very, but it is clearly missing
> the highest harmonics if the pitch is low enough.

what frequencies are those higher harmonics at?

think about it, middle C and the 256th harmonic.

then find yourself a good analog synth with a sawtooth you think is 
"perfect", and play that back through a brick-wall filter (if you can 
get something that approximates it) set to 20 kHz.  and tell me then if 
you hear the difference.  if you say you can, then it's time for blind 
testing.

you see, we don't hear "perfect" sawtooths.  we hear the portion of 
those perfect sawtooths that fall within our range of hearing.

>
> - 1024 samples wavetable is not good enough. You kinda got me experimenting
> :), so I was doing some measurements. With linear interpolation even 2048
> samples was doing much more distortion than 8192.

what's the highest non-zero harmonic?

> It was probably beyond
> hearing limits (like -120dB or something), but with some postprocessing it
> can easily get audible.

be specific.  what post processing?

> However it got much better with cubic interpolation.

which cubic interpolation?  Lagrange?  Hermite?  B-spline?

what is the oversampling ratio?  (which is half the wavetable size 
divided by the index of the highest non-zero harmonic.  and how loud 
*is* that harmonic?)

> Anyway I wasn't doing hearing tests, so I cannot say really,
> but it was very easy to measure the difference

oh.  that explains it.

:-\

> - there were just weird
> inharmonic peaks (even below fundamental). Not sure if it was aliasing of
> something, oversampling did help, but it was much more effective to use
> cubic interpolation than oversampling it.
>

keep experimenting.

>
> When it comes to oversampling, then I cannot really afford too much. Say 4x
> oversampling is reasonable, 512x absolutely not.

you missed the point.  the oversampling is accomplished by having a 
wavetable of N points with very little or *no* energy in the harmonics 
with indices anywhere close to N/2.

> Generating the upsampled
> points isn't such a big deal, but the filtering is.

> And it must be zero-latency, because it is realtime.
>

so a 32-sample delay (from using 16-sample double buffering) is gonna 
make it not realtime?

"realtime" does not mean the same as "live".  but even "live" can handle 
a little delay.  you get 44 samples of delay for every foot you stand 
away from your amp or monitor speakers.

>
> The "arithmetics" - I think I understand it now. So the idea is that when
> we limit our hearing to 20k and have Nyquist at say 24k, then there is 8k
> space, which we can fill with any mess, including alias and we are ok.

it's a 4 kHz space.

> So we can synthesize any pitches where the highest harmonic fits interval
> 20k-28k. Correct?
>
> Personally I'm not sure about the theory, that we can totally ignore
> everything above 20k, after all there are headphones capable of reproducing
> 30kHz.

better do some blind testing.  with false negatives to keep the test 
subjects honest.

but they've done it before.

> Though I have never seen a human, who would hear above 18k and my
> personal limit is about 17k too. So let's say so.

then, there you go.  you're not hearing harmonics above 18 kHz, you 
wouldn't know if they're missing above 18 kHz, nor if they're aliased 
and remain above 18 kHz.

>
> But there's a problem - the typical sampling rate is still 44100, and with
> the nyquist approaching 20k the number of required bandlimited wavetables
> grows exponentially. According to your formula this is about 4 per octave,
> feasible, but quite a lot.

bump it down to 18 kHz, then you're back to 2.

> But I'm definitely going to try some experiments
> and get back here with the results.
>
>
> I think I won't need the paper, I wouldn't have time to study it I'm
> afraid, my computer is full of documents to study for years already :D. I'm
> not in music-dsp mailing list, I'll check it out.

highly recommended.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by jungledmnc ●July 10, 20142014-07-10

>> - 63 harmonics is waaaaay not enough - I actually have a harmonic based
>> generator, which uses 256 harmonics and can convert from/to "shape". So
for
>> example if I draw a "perfect" sawtooth and convert it to 256 harmonics
and
>> synthesize it back it sounds different. Not very, but it is clearly
missing
>> the highest harmonics if the pitch is low enough.
>
>what frequencies are those higher harmonics at?
>
>think about it, middle C and the 256th harmonic.
>
>then find yourself a good analog synth with a sawtooth you think is 
>"perfect", and play that back through a brick-wall filter (if you can 
>get something that approximates it) set to 20 kHz.  and tell me then if 
>you hear the difference.  if you say you can, then it's time for blind 
>testing.
>
>you see, we don't hear "perfect" sawtooths.  we hear the portion of 
>those perfect sawtooths that fall within our range of hearing.

Yes, but with the pitch of say 70Hz, a nice dubstep subbass :), you are at
70 * 256 = almost 18k, which is still audible. And for rendering drums or
even "bassier" tones, we can easily get to 40-50Hz. So even 256 harmonics
are not enough to represent all harmonics.



>> - 1024 samples wavetable is not good enough. You kinda got me
experimenting
>> :), so I was doing some measurements. With linear interpolation even
2048
>> samples was doing much more distortion than 8192.
>
>what's the highest non-zero harmonic?

That depends of course, for sawtooth it is unlimited, right?


>> It was probably beyond
>> hearing limits (like -120dB or something), but with some postprocessing
it
>> can easily get audible.
>
>be specific.  what post processing?

It can be absolutely anything. Different kinds of distortion, filtering,
level compression... It's really easy to amplify the "dirt" in the signal.
Many do it intentionally.


>> However it got much better with cubic interpolation.
>
>which cubic interpolation?  Lagrange?  Hermite?  B-spline?

I think it is hermite.



>what is the oversampling ratio?  (which is half the wavetable size 
>divided by the index of the highest non-zero harmonic.  and how loud 
>*is* that harmonic?)

I use 8192 point wavetables, which may either be rendered using 256
harmonics, or directly by "shape", so it can be a "perfect" sawtooth, with
all harmonics. We could say that I'm going to accept minimum pitch of say
40Hz. With 20k high limit, there needs to be 500 harmonics then.


Anyway some more tests:
Sawtooth wave, 171Hz (C4), 44kHz sampling rate, analysed using FFT 65536
points, hann window, checking up to -150dB:

1) bandlimited (1 per oct), 3x oversampling (to exceed 96k), downsampling
using minphase 72dB/oct

cubic interpolation => crystal clear

linear interpolation => alias (probably), e.g. 670Hz, -110dB 

2) bandlimited (1 per oct), no oversampling

cubic interpolation => clear, allias can be measured as the residue from
the top octave in the bandlimited wavetable, but I don't hear a difference

linear interpolation => same as cubic, but an additional line of aliased
frequencies, e.g. again 670Hz, -110dB 

3) no bandlimit

cubic & linear interpolation cannot be really distinguished in all that
alias, e.g. 150Hz, -50dB


So the linear interpolation really isn't enough even for big wavetables.
With 2048 point wavetable and cubic interpolation with 3x oversampling, the
results were similar to when linear-interpolation was used, but it was even
worse.



>> When it comes to oversampling, then I cannot really afford too much. Say
4x
>> oversampling is reasonable, 512x absolutely not.
>
>you missed the point.  the oversampling is accomplished by having a 
>wavetable of N points with very little or *no* energy in the harmonics 
>with indices anywhere close to N/2.

Aaaaah ok!
Anyway the band limiting with 8192 samples (even if there are just say 20
harmonics!) seems working very well. But still, if I decrease the wavetable
size, 



>> And it must be zero-latency, because it is realtime.
>>
>
>so a 32-sample delay (from using 16-sample double buffering) is gonna 
>make it not realtime?
>
>"realtime" does not mean the same as "live".  but even "live" can handle 
>a little delay.  you get 44 samples of delay for every foot you stand 
>away from your amp or monitor speakers.

I cannot fully agree. 44 samples is 1 millisecond and every millisecond is
relevant. Of course, we can live with it, but if we can avoid it... The
minimum phase filter should be fine with it.

Btw. where did you get the "32-sample" delay get from? You mean
linear-phase filtering with 32 point FIR? I don't know, but I'd say it can
hardly be that steep with such a few points, or could it? I'm generally
using 512 points or even 2048 points if I need linear-phase.


>>
>> The "arithmetics" - I think I understand it now. So the idea is that
when
>> we limit our hearing to 20k and have Nyquist at say 24k, then there is
8k
>> space, which we can fill with any mess, including alias and we are ok.
>
>it's a 4 kHz space.

Hmmm, then I'm missing something - let's say I create a wavetable for
generating pitches from 100Hz and the limit 20k, sampling rate 48k, so
Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then
the highest harmonic will be at 28k, which will alias from 24k to 20k,
right? So where am I wrong?


>> Personally I'm not sure about the theory, that we can totally ignore
>> everything above 20k, after all there are headphones capable of
reproducing
>> 30kHz.
>
>better do some blind testing.  with false negatives to keep the test 
>subjects honest.

That's a really hard thing to do, because many people are "trained". There
are even apps to train the ears, online. And it also depends on the
situation - e.g. if you are in a noise environment, the high frequency
resolution gets lower. If you don't sleep well, same thing. So the
listening tests are good as a "guide", but saying "we can easily create
anything above 20k" just because me and nobody around me listens it isn't
such a good idea.


>bump it down to 18 kHz, then you're back to 2.

Not really a good idea, some people just have better ears, and they can
measure it... some really do.

Simply put, I cannot judge ears for other people. It's even possible our
hearing will get better in the future with meds and stuff. And imagine
someone makes some music, that will be "ugly" in the future, because
suddenly people will hear above 20k.

I know it's a little extreme attitude, but still... Though your idea with
having more band limited wavetables is probably the only good solution.
Though even with 1 table / oct it looks good now.

jungledmnc	 

_____________________________		
Posted through www.DSPRelated.com

Reply by Bob Masta ●July 12, 20142014-07-12

On Thu, 10 Jul 2014 09:15:38 -0500, "jungledmnc"
<34728@dsprelated> wrote:

<snip>
>Hmmm, then I'm missing something - let's say I create a wavetable for
>generating pitches from 100Hz and the limit 20k, sampling rate 48k, so
>Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then
>the highest harmonic will be at 28k, which will alias from 24k to 20k,
>right? So where am I wrong?

I may be missing something as well, but doing this as a
(simple-minded) thought experiment I imagine a table holding
one cycle of a ramp wave.   You change the output frequency
by changing the step size through the table, while keeping
the step rate fixed at the sample rate. 

Since this is a linear ramp, you can use simple linear
interpolation for steps that land between table value.  As
you increase the step size, you traverse around the table
faster, but the wave shape doesn't change... it still ramps
up to maximum and abruptly drops to minimum and repeats.

So if the anti-alias filter is happy with that abrupt drop
(and resultant spectral splatter) at low output frequencies,
wouldn't it be happy at higher frequencies as well?

Best regards,

Bob Masta

              DAQARTA  v7.60
   Data AcQuisition And Real-Time Analysis
              www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
 Frequency Counter, Pitch Track, Pitch-to-MIDI 
   FREE Signal Generator, DaqMusiq generator    
          Science with your sound card!

Reply by robert bristow-johnson ●July 12, 20142014-07-12

On 7/12/14 8:26 AM, Bob Masta wrote:
> On Thu, 10 Jul 2014 09:15:38 -0500, "jungledmnc"
> <34728@dsprelated>  wrote:
>
> <snip>
>> Hmmm, then I'm missing something - let's say I create a wavetable for
>> generating pitches from 100Hz and the limit 20k, sampling rate 48k, so
>> Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then
>> the highest harmonic will be at 28k, which will alias from 24k to 20k,
>> right? So where am I wrong?
>
> I may be missing something as well, but doing this as a
> (simple-minded) thought experiment I imagine a table holding
> one cycle of a ramp wave.

it *should* be a bandlimited ramp wave.

>   You change the output frequency
> by changing the step size through the table, while keeping
> the step rate fixed at the sample rate.
>
> Since this is a linear ramp,

but it isn't exactly that.  it is an approximation of the linear ramp 
with a finite number of non-zero harmonics.  start with an actual linear 
ramp (with harmonics that decay as 1/k) and zero the coefficient of all 
harmonics above the Kth harmonic (i have changed the notation from the 
"Nth" harmonic, because "N" is now gonna be the FFT length).  so all FFT 
bins between K and N-K are set to zero, then inverse FFT.  that's the 
waveform.

now at lower pitches, when K is a larger value, the waveform ramp will 
look more linear.  but at higher pitches K is smaller, there are fewer 
harmonics, and the waveform will look a little sloppier.  in all cases 
the edge is a little bit sloppy compared to a perfect analog (more 
precisely "continuous-time") waveform.  but if you pass that perfect 
analog waveform through a perfect analog brick-wall filter set to, say 
20 kHz (or for deafies like me, even a little lower), no one will hear 
it differently.  our hearing is more sophisticated than a simple linear 
Fourier analysis machine, but we *do* have finite hearing range.  there 
is *some* limit such that if all frequencies above that limit are 
removed, we don't hear it.  again, if someone claims that they hear the 
difference between the sawtooth with harmonics that go to say 30 kHz, 
and another identical sawtooth with the same harmonics up to, say, 19 
kHz, if they claim that, there are methods of blind testing, complete 
with false negative and false positives (to keep us all honest).

> you can use simple linear
> interpolation for steps that land between table value.

even with a little curvature, linear interpolation works quite well when 
the Nyquist frequency is much much higher than the highest harmonic.  we 
have come up with mathematical expressions that compute the entire 
energy of the images, that in worst case, can all fold back into the 
baseband.

>  As
> you increase the step size, you traverse around the table
> faster, but the wave shape doesn't change... it still ramps
> up to maximum and abruptly drops to minimum and repeats.

how abrupt depends on the number of non-zero harmonics and the step size 
(what i like to sometimes call the "stride").  but for higher pitches, 
there are fewer non-zero harmonics, but the stride is also larger, so 
it's about the same for lower or higher pitches.

> So if the anti-alias filter is happy with that abrupt drop
> (and resultant spectral splatter) at low output frequencies,
> wouldn't it be happy at higher frequencies as well?

if you set it up correctly (using different wavetables for lower pitches 
than for higher pitches), it comes out just as well for lower or higher 
pitches.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by jungledmnc ●July 16, 20142014-07-16

> Hmmm, then I'm missing something - let's say I create a wavetable for
> generating pitches from 100Hz and the limit 20k, sampling rate 48k, so
> Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then
> the highest harmonic will be at 28k, which will alias from 24k to 20k,
> right? So where am I wrong?

Just to clarify - could you check this r-b-j? I just don't see where am I
wrong.	 

_____________________________		
Posted through www.DSPRelated.com

Reply by Dave ●July 16, 20142014-07-16

On Saturday, July 12, 2014 8:26:39 AM UTC-4, Bob Masta wrote:
> On Thu, 10 Jul 2014 09:15:38 -0500, "jungledmnc"
> 
> <34728@dsprelated> wrote:
> 
> 
> 
> <snip>
> 
> >Hmmm, then I'm missing something - let's say I create a wavetable for
> 
> >generating pitches from 100Hz and the limit 20k, sampling rate 48k, so
> 
> >Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then
> 
> >the highest harmonic will be at 28k, which will alias from 24k to 20k,
> 
> >right? So where am I wrong?
> 
> 
> 
> I may be missing something as well, but doing this as a
> 
> (simple-minded) thought experiment I imagine a table holding
> 
> one cycle of a ramp wave.   You change the output frequency
> 
> by changing the step size through the table, while keeping
> 
> the step rate fixed at the sample rate. 
> 
> 
> 
> Since this is a linear ramp, you can use simple linear
> 
> interpolation for steps that land between table value.  As
> 
> you increase the step size, you traverse around the table
> 
> faster, but the wave shape doesn't change... it still ramps
> 
> up to maximum and abruptly drops to minimum and repeats.

I think I read somewhere of using a virtual table index i.e. one that is larger than your actual table, so that it reduces the quantization error in the index that can build up over time. 

There was also a technique of using 2 table. The first one has course granularity and the 2nd one has finer granularity but only between 1 step of the first table. Then by using trigonometric identities you can calculate cos(A+B) where A comes from the first table and B comes from the 2nd table.

Cheers,
Dave

Reply by robert bristow-johnson ●July 16, 20142014-07-16

On 7/16/14 4:17 AM, jungledmnc wrote:
>> Hmmm, then I'm missing something - let's say I create a wavetable for
>> generating pitches from 100Hz and the limit 20k, sampling rate 48k, so
>> Nyquist 24k. Then if I generate 100 * (28000/20000) = 140 Hz pitch, then
>> the highest harmonic will be at 28k, which will alias from 24k to 20k,
>> right? So where am I wrong?
>
> Just to clarify - could you check this r-b-j? I just don't see where am I
> wrong.	

nothing wrong.  200 harmonics in either case.  with high-quality 
interpolation (better than linear), you would need a minimum of 401 
samples in the wavetable.

note that 140 Hz is about 1/2 octave above 100 Hz.

*semantically* i would still say that the guard band is 4 kHz, not 8. 
when you're sampling at 48 kHz and your Nyquist is at 24 kHz, there 
really *is* no 28 kHz.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by jungledmnc ●July 16, 20142014-07-16

Aaaah, great, got it! Thank you!

jungledmnc	 

_____________________________		
Posted through www.DSPRelated.com

Reply by robert bristow-johnson ●July 16, 20142014-07-16

On 7/16/14 9:04 AM, Dave wrote:
> >
> I think I read somewhere of using a virtual table index i.e. one that is larger than your actual table, so that it reduces the quantization error in the index that can build up over time.
>
> There was also a technique of using 2 table. The first one has course granularity and the 2nd one has finer granularity but only between 1 step of the first table. Then by using trigonometric identities you can calculate cos(A+B) where A comes from the first table and B comes from the 2nd table.

this would work for a single sine wave in the wavetable.  i think i came 
across this before in the context of non-musical use (like it was an 
"NCO" or "DDS" or whatever they're calling it nowadaze).  you can have a 
sin and cos table (it can be the same table) for "t" and then another 
sin and cos table for "delta_t" and do

    sin(t + delta_t)  =  cos(t)*sin(delta_t)  +  sin(t)*cos(delta_t)

where delta_t is smaller than the difference between adjacent t values 
in the first table.

for a general waveshape and some finite-order polynomial interpolation, 
you can have one table for each power of the "delta_t" portion and add 
up the power series real fast.  in the case of 1st-order power series 
(a.k.a. linear interpolation), you can easily eliminate that second 
table with a mere subtraction.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."