DSPRelated.com
Forums

The $10000 Hi-Fi

Started by Unknown May 3, 2015
robert bristow-johnson  <rbj@audioimagination.com> wrote:

>On 5/6/15 1:45 AM, Steve Pope wrote:
>> The real point is, when you quantize from analog and/or reduce precision, >> dithering is wise. If there is not already enough noise in the >> source signal to auto-dither the thing, then add explicit dithering. >> Don't waste brain cycles arguing "it might not matter", just do it, >> unless the cost is prohibitive, which it almost never is, especially >> in audio.
[...]
>wellllll.... > >i do audio. in audio algorithms i deal with, there are several, >sometimes *many* points of quantization. like, often, every gain block >and more. > >dither isn't that cheap. a good RNG costs instruction cycles. cheap >RNGs (like linear congruance) usually sound like crap. building TPDF >dither doubles the cost (unless you're building that high-pass TPDF >which does not double the cost). > >i have found that undithered truncating *with* noise-shaping (like that >fixed-point DC blocking trick at dspguru.com) which Randy calls >"fraction saving" (a *very* good name for it) works *very* well for most >of these internal nodes (at 24 bit). it steers the quantization noise >into the top octave (and away from the bottom 8 or 9 octaves) where i am >deaf anyway. and it solves the DC limit-cycle problem which is why it's >useful in a DC-blocking filter or any other audio filter with high-Q >poles (an annoyance is when absolute silence goes into your filter after >some sound and the level comes down to -70 dB and gets stuck there). > >and, with floating-point, i most often don't do anything. even with >single-precision float. i'm trying to understand how i can cheaply get >those lost 8 bits in the SHArC when it goes from 40 internal bits to 32. > i would think that rounding error can be used with noise shaping >feedback in the same manner as with fixed-point error shaping. > >however, like when going from 24-bit to 16-bit (like in mastering), >really good dither and noise-shaping (or using "colored dither" like >UV22) is mandatory. dunno what should be done if mastering to MP3. i >thought the MP3 coding alg figgered it out anyway. > >so, in my opinion, some brainy judgment is useful.
Thank you for the insight. You are of course correct, and I was over-generalizing above. Tangent: It is possible, at least in a lot of situations I encounter, to choose precisions such that the noise present (either as added dither noise, or otherwise) at the input(s) to a algorithmic block has the effect of dithering all of the precision-reduction points within that block, such that the only additional point where dithering might be desired is at the output(s) of the block. I think I'm going to say that this might be true pretty generally, given enough freedom to choose precisions. (Hence my phrasing above, "if there isn't already enough noise in the signal to dither" the precision-reduction point.) But you might end up with some really long word widths, and if you're designing to a processor word width or processor floating point format, you likely won't have that freedom. The dual to this tangent is: if you're not applying sufficient noise to the input of such a block, then some of the precision-reduction points in the internal signal flow will be effectively undithered, and the signals at those points quite often exhibit unexpected, non-intuitive, and/or undesired behavior. I have seen design projects get into such a state, in various systems that I have worked on, with the potential time designers might spend trying to understand the implications of every instance of undithered behavior being potentially quite large and could easily fill up one's time available that might arguably be better spent tackling weightier problems. It is here that I see what I'd call a blanket-dithering design methodology as quite useful, and in very large systems, almost mandatory because the alternative is so time-consuming. Having said that, I am sure that in audio that trade-offs you describe above are necessary. Steve
robert bristow-johnson  <rbj@audioimagination.com> wrote:

>On 5/6/15 9:23 AM, Steve Pope wrote:
>> Not sure Sony was the first, but they had an early 16-bit >> quantizer (I believe marketed as a "PCM unit", although that's >> a misnomer) that piggybacked onto a VCR used as a data recorder. >> This was around 1978.
>i think it's the Sony F1. and they used betamax, not VHS.
Yes, it had to be Betamax, which I had completely forgotten existed.
>and this is >the reason why 44.1 kHz (or, more precisely, 44.056 kHz with the F1) >became the CD sample rate standard. very icky. too bad they didn't go >with 48 kHz.
I had not realized the relationship there. Thanks. Steve
[...snip...]

>> Cedron<103185@dsprelated> wrote: >> >>> Do you apply software audio compression? (not data compression). >Many >>> years ago I derived the formulas for a compression curve that >extended the >>> straight line portion with a hyperbola > >something like x - (1-1/r)*log(k + e^x) ?
Strictly speaking, I don't think that is a hyperbola. This was many years ago, so off the top of my head: A / ( x - B ) + C The inputs were the slope of the line, and optionally the cutoff point. The program would then solve for A, B, and C. The curve would match the end of the line, the first derivative would match the slope, and the curve would hit (1,1). If the cutoff point was not specified, an "optimal" one was selected, but I can't remember what the criteria was that made it optimal. Anyway, if you plotted the input/output graph you got a straight line to a point where it started gradually bending till it hit the upper right corner. If you selected a slope less than one, the graph would curve upward to hit the corner. I believe it was invertible, so if you ran it through with a slope of m, then ran that through with a slope of 1/m, you got your original (minus rounding errors) back.
> >("r" is the compression ratio after the bend. "k" is the knee >softness.) > > >>> that matched the first derivative >>> so there was no knee effect. > >a bend without a knee? what does that mean? > >do you actually mean a "soft knee"? a knee with programmable softness? >
Yes, I suppose so. An extremely soft knee, as in no discontinuity in the first derivative, and a smoothly declining second derivative past the small discontinuity at the juncture. ----------- I am finding this discussion fascinating. As I mentioned earlier, I am rewriting my recording program in Linux using ALSA and XLib. Most of the work is in the display and recording controls. I already have a simple prototype that blindly records. My intention is to stick with CD quality, i.e. 44100, 16bit. For the recording portion I write out a wav file header and then just copy the data from the sound buffer to the file. Would anybody recommend that I do it differently? Particularly, should I add dithering. Should I make 24bit, 96K a command line option? --------------------------------------- Posted through http://www.DSPRelated.com
On 5/6/2015 8:01 PM, Steve Pope wrote:
> robert bristow-johnson <rbj@audioimagination.com> wrote: > >> On 5/6/15 9:23 AM, Steve Pope wrote: > >>> Not sure Sony was the first, but they had an early 16-bit >>> quantizer (I believe marketed as a "PCM unit", although that's >>> a misnomer) that piggybacked onto a VCR used as a data recorder. >>> This was around 1978. > >> i think it's the Sony F1. and they used betamax, not VHS. > > Yes, it had to be Betamax, which I had completely forgotten > existed. > >> and this is >> the reason why 44.1 kHz (or, more precisely, 44.056 kHz with the F1) >> became the CD sample rate standard. very icky. too bad they didn't go >> with 48 kHz. > > I had not realized the relationship there. Thanks.
I don't think the fact that it was Beta vs. VHS had anything to do with it. I think the sample rate is linked to the TV rates which are the same for both. There were early digital recordings on VCRs and the CD sample rate was set to be compatible. -- Rick
robert bristow-johnson <rbj@audioimagination.com> writes:
> [...] > i would have a few little bones to pick with this paper. first of all > the author repeats the common mistake of confusing power spectrum > ("white" vs. "colored") and p.d.f. (RPDF vs. TPDF vs. Gaussian). > them's about different properties of a random process. you most > certainly **can** have TPDF *and* some kinds of colored noise. my > favorite
Robert, I was going to mention it too, but I grow tired of finding and pointing out such faults in thinking... -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
On Wed, 06 May 2015 18:06:15 -0400, robert bristow-johnson
<rbj@audioimagination.com> wrote:

>i would have a few little bones to pick with this paper. first of all >the author repeats the common mistake of confusing power spectrum >("white" vs. "colored") and p.d.f. (RPDF vs. TPDF vs. Gaussian).
Granted. But my point was that truncation introduces distortion while proper dither does not. I was most interested in finding a paper that included spectral plots demonstrating this.
On Wed, 06 May 2015 18:30:20 -0400, robert bristow-johnson
<rbj@audioimagination.com> wrote:

>whether it's with an old Freescale 56K or some other processor, a lot of >internal arithmetic in audio algorithms is done at 24 bits. 32-bit >floats have a 25-bit mantissa.
As I understand it, single-precision IEEE 754 floating point has 24 bit precision with 23 fractional bits explicitly stored. Were you referring to a different single-precision format?
robert bristow-johnson <rbj@audioimagination.com> wrote:

(snip, I wrote)
>> I use a pretty simple LFSR generator, based on CRC32, to generate >> bits, shift and add.
> oooh, glen, that can't be good. are you using the LFSR to just get > random *bits* (like +1 and -1) that are white. then it works good. but > if you're using the whole shift register for a random number, that's not > so good. that's because 50% of the time the following register value is > related to the current by a factor of 2. and the spectrum of this is > *not* white but is low-pass.
It runs on the input data stream, so there are 24 shifts before the next one is used. Most likely there is plenty of noise in the low bits of the input 24 bit data. If the input is really quiet, it could have some zeros in the high bits, but there should still be enough noise in the low bits. These are recorded with a live audience, but the amplifier noise is likely enough to keep the lowest bits pretty random. I could actually hear the difference, turning my 100W amplifier all the way up, and putting my ear right next to the speaker. If a normal signal came though, it might have destroyed the speaker, so it is pretty far down. But yes, there are probably better sources. If I wanted to convert 16 bit down to 8 bit, I might worry more about it. -- glen
rickman  <gnuarm@gmail.com> wrote:

>I am also a doubting Thomas, but I'm willing to leave some room for >self-doubt (just not a lot). I do know that truncation can do funny >things as the frequencies in the signal beat with the truncation as well >as each other, producing non-linear distortion. But at -144 dBFS it is >hard to imagine it would be in any way audible. In 16 bits (-96 dBFS) >I'm willing to acknowledge magic ears can hear it easily (my ears are >far from magic).
>It is always possible that there was some flaw in the original design >that got fixed when switching to rounding. I just can't imagine anyone >can hear the effects of 24 bit arithmetic.
Within a digital algorithm that includes feedback, switching from undithered truncation to undithered rounding can make a large (generally beneficial) difference in behavior, or even stability. But viewed in isolation, truncation vs. rounding, without dithering, both introduce (often undesired) signal-correlated noise at similar levels. Steve
Greg Berchin  <gjberchin@chatter.net.invalid> wrote:

>On Wed, 06 May 2015 10:36:45 -0400, rickman <gnuarm@gmail.com> wrote:
>>When you say you "dithered" it at 14 bits, did you truncate/round the >>data to 14 bits? If not, you were just listening to added noise which >>is very different.
>The bits below 14 (or whatever "N" was selected) were all 0.
>The code works in double precision floating point, scales as necessary >to make 2^(N-1) represent full-scale, adds the dither (a fractional >value), rounds [add 0.5 then floor()], and then re-scales to 16 bits.
That would be correct. Tangentially, a vast range of fixed-point values and operations can be cast to and from doubles without deviating from bit-exactness, that is, the result will be the same as if you tediously programmed it entirely in fixed point. Eseentially all the projects I've worked on in the past decade or so have been simulated in this manner. Steve