DSPRelated.com
Forums

Audio CODECs

Started by Don Y February 10, 2012
Hi,

I'm looking some pointers concerning the design of lossless
audio (plus "silence") codecs.

I want to deploy these on either end of a packet switched
network (coder at server, decoder at client).  I.e., they
are intended primarily for communication bandwidth reduction.
Push content into coder, pass over network, extract content
via decoder, *consume* (and discard).  I.e., the system makes
the network look like a long "virtual wire".


The decoder needs to be *fast*.  Ideally, suitable for on-the-fly
operation (i.e., having to "expand" an entire frame "in place"
is less desirable than being able to expand it AS CONSUMED).

[[[Note:  I am targeting general purpose MCU's, not DSP's!]]]

Smaller frame sizes are better than larger ones (requires
less resources to hold in the client WHILE CONSUMING).  And,
bigger frames mean bigger packets mean supporting fragmentation
and reassembly in the protocol stack, etc.  Alternatively,
makes the data stream more sensitive to dropped fragments.

(Some/much) content can be encoded a priori (e.g., as in
a media server) so the cost of coding or transcoding can
be considerably higher than decoding.  OTOH, it shouldn't
be prohibitively higher precluding any "real-time" use.

The type of source material shouldn't have a dramatic effect on
the efficiency or cost of either coder or decoder (speech, music,
etc. -- don't worry about "white/pink/chartreuse/etc noise")


 From some observations of existing CODECS (open and proprietary):

All try to encapsulate a variety of different source formats:
bits per sample, samples per second, seek points, tags, etc.

All try to apply different compression strategies which are
then encoded in the data stream.

Most seem to treat the source material as discrete "sessions"
(song 1, song 2, etc.) instead of an endless *stream* of content.

It appears that most compression gains come from exploiting
the reduced bandwidth of the difference channel.  This only
works if you have two (related) source channels -- i.e., the
compression is less remarkable for mono sources.

Coder efficiencies (costs) tend to vary, greatly.  Often the
added "expense" results in very little additional gain in
compression (which can't be determined a priori).  While this
isn't significant for "batch" applications (encode, then store
for later distribution), it can be a deal breaker for "live"
content.


So...

In this sort of dedicated application, many of the "features"
of these CODECs are superfluous or redundant.  E.g., you can
probably fix the sample rate and compensate for variations
in source materials in the encoder (this makes it simpler for
the decoder to blindly reproduce that content without concern
for the actual sample rate of the original source).  Ditto
for sample sizes.

But, I'm not sure if you can as easily discard the adaptive
coding (decoding) strategies without knowing more about the
actual signal you will be encountering.

Do certain models/predictors/encodings tend to solve most
of the coding problem -- with the others present to cover
special cases?  E.g., without a difference channel, it
doesn't seem that RLE for the residual would be of much
use (?)

Anything else I've missed as shortcuts to reduce the
complexity of the coder/decoder?  Any risks that these
shortcuts might have lurking inside them?  Pathological
cases that could (realistically) be encountered?

Any suggestions as to developing/acquiring a versatile
test suite with which to gauge performance?  Or, just
pick some of the things that are *likely* to pass down
the wire?

I've implemented this with a couple of different "open"
CODECs and am now trying to determine if there are any
changes that are worthwhile to attempt to improve these
criteria.

Thx!
--don

[Apologies if I don't reply quickly.  I'm rearranging machines
here so my news server is a mess -- and likely to get worse
before it "recovers".  I may try reading news via google just
to keep abreast of anything posted, here. ]

Don Y wrote:

> I'm looking some pointers concerning the design of lossless > audio (plus "silence") codecs.
The design is trivial: backwards adaptive predictor followed by conventional Huffman coder.
> I want to deploy these on either end of a packet switched > network (coder at server, decoder at client). I.e., they > are intended primarily for communication bandwidth reduction.
Loseless audio compressor is hardly useful in this scenario, as it does not guarantee a fixed bandwidth.
> The decoder needs to be *fast*.
Then omit backward adaptation. Transmit forward prediction coefficients over the channel.
> [[[Note: I am targeting general purpose MCU's, not DSP's!]]]
Like what, for example?
> > Smaller frame sizes are better than larger ones (requires > less resources to hold in the client WHILE CONSUMING). And, > bigger frames mean bigger packets mean supporting fragmentation > and reassembly in the protocol stack, etc. Alternatively, > makes the data stream more sensitive to dropped fragments.
That's irrelevant. Large buffers will be needed to cope with the variable data rate.
> The type of source material shouldn't have a dramatic effect on > the efficiency or cost of either coder or decoder (speech, music, > etc. -- don't worry about "white/pink/chartreuse/etc noise")
The compression ratio is going to be only about 50% or so. Then why bother about compression at all?
> From some observations of existing CODECS (open and proprietary): > > All try to encapsulate a variety of different source formats: > bits per sample, samples per second, seek points, tags, etc. > > All try to apply different compression strategies which are > then encoded in the data stream.
The codecs are LOSELESS. Once the session is started, you can't drop any data.
> Most seem to treat the source material as discrete "sessions" > (song 1, song 2, etc.) instead of an endless *stream* of content.
So you will have to parse the uninterruptible stream from very beginning in the last year. If you loose a packet, you are lost.
> > So...
[...] So. I am tired of your bleat. Stop here and do anything useful other then spewing internet with nonsense. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Feb 10, 11:34&#4294967295;am, Vladimir Vassilevsky <nos...@nowhere.com> wrote:

> > I am tired of your bleat.
Ha ha. To me, it seems that you Live for this kind of "bleat".
Vladimir Vassilevsky <nospam@nowhere.com> wrote:

(snip)
> The design is trivial: backwards adaptive predictor followed by > conventional Huffman coder.
>> I want to deploy these on either end of a packet switched >> network (coder at server, decoder at client). I.e., they >> are intended primarily for communication bandwidth reduction.
> Loseless audio compressor is hardly useful in this scenario, as it does > not guarantee a fixed bandwidth.
(snip)
>> All try to apply different compression strategies which are >> then encoded in the data stream.
> The codecs are LOSELESS. Once the session is started, you > can't drop any data.
Not very useful in the real world, though. Even back to the beginning of CDs, there is error concealment when error correction fails. There used to be stories (maybe still are) of testing CD players with a CD with a black wedge on it. The wedge blocks the light for an ever increasing length of time each revolution. You can then listen, and see at what point the error correction fails, and how well the concealment sounds. Thinks like that used to be in reviews for CD players, but maybe not anymore. With VoIP, you never know what will happen on the net, in terms of delayed or lost packets. Some kinds of concealment is needed. -- glen
Hi Vladimir,

On 2/10/2012 9:34 AM, Vladimir Vassilevsky wrote:
> Don Y wrote: > >> I'm looking some pointers concerning the design of lossless >> audio (plus "silence") codecs. > > The design is trivial: backwards adaptive predictor followed by > conventional Huffman coder.
If it was "trivial", one-size-fits-all, someone would have designed, patented, and commercialized it -- and retired to some sunny beach to drink "tuna" coladas and watch cute young things frolick in the surf. The fact that there are so many different CODECs testifies to the non-triviality of the task.
>> I want to deploy these on either end of a packet switched >> network (coder at server, decoder at client). I.e., they >> are intended primarily for communication bandwidth reduction. > > Loseless audio compressor is hardly useful in this scenario, as it does > not guarantee a fixed bandwidth.
It doesn't *have* to guarantee a fixed bandwidth. It just has to nominally afford some reduction in required bandwidth to offset the implementation cost.
>> The decoder needs to be *fast*. > > Then omit backward adaptation. Transmit forward prediction coefficients > over the channel. > >> [[[Note: I am targeting general purpose MCU's, not DSP's!]]] > > Like what, for example?
Like whatever the implementor decides is appropriate! If you can't design hardware, you might port it to a PC platform. If you've got a DSP in an existing product, port it there. Etc. Tying an implementation to a particular hardware platform is "premature optimization". Figure out what needs to be done (with your best effort) and use that to determine the minimum requirements for any hosting platform.
>> Smaller frame sizes are better than larger ones (requires >> less resources to hold in the client WHILE CONSUMING). And, >> bigger frames mean bigger packets mean supporting fragmentation >> and reassembly in the protocol stack, etc. Alternatively, >> makes the data stream more sensitive to dropped fragments. > > That's irrelevant. > Large buffers will be needed to cope with the variable data rate.
No. The model that you chose for the coder (and thus, decoder) determines the resources that will be needed. Silly example: You're going to be passing (pure) sine waves down the wire. I can encode that as 4 values: amplitude, frequency, phase and duration. The decoder can take those four values and reconstruct an equivalent sine wave with almost *0* resources at its disposal. This is why the knowledge of folks with first-hand experience is worthwhile. What models work best in which circumstances, etc. An encoder for speech is not going to be as effective encoding "music". (speech has lots of silence). OTOH, it might work well encoding a (single) *singer*.
>> The type of source material shouldn't have a dramatic effect on >> the efficiency or cost of either coder or decoder (speech, music, >> etc. -- don't worry about "white/pink/chartreuse/etc noise") > > The compression ratio is going to be only about 50% or so. > Then why bother about compression at all?
Because compressing is cheaper than running another wire or upgrading the communications fabric to handle higher bandwidths. Save a dollar on the processor and spend thousands on more cable?
>> From some observations of existing CODECS (open and proprietary): >> >> All try to encapsulate a variety of different source formats: >> bits per sample, samples per second, seek points, tags, etc. >> >> All try to apply different compression strategies which are >> then encoded in the data stream. > > The codecs are LOSELESS. Once the session is started, you can't drop any > data.
That doesn't directly depend on the strategy used in doing the compression. Rather, that depends on how the protocol handles errors/dropouts. OTOH, a CODEC that has to handle music *and* speech might chose to use different strategies to represent that data based on its knowledge/examination of the data stream.
>> Most seem to treat the source material as discrete "sessions" >> (song 1, song 2, etc.) instead of an endless *stream* of content. > > So you will have to parse the uninterruptible stream from very beginning > in the last year. If you loose a packet, you are lost.
No. Only if the coder stores no "absolute state" in the data stream. This doesn't seem to be the case for the codecs that I've examined. It would be a burden to implementors for that reason as well as making the stream "unseekable" (or, only seekable at elevated expense) -- your decoder would have to process all of the "skipped over" content in order to accurately track state.
>> So... > > [...] > > So. > > I am tired of your bleat. Stop here and do anything useful other then > spewing internet with nonsense.
Add me to your kill file. Or, discipline yourself not to open any posts with my name on them. I don't try to "elude" filters. I always post from the same IP, use the same news service, etc. I'm sure someone like you should be able to figure out how to rid yourself of these "unpleasant distractions". If not, ask one of the kids in the neighborhood to show you how...
Hi Glen,

On 2/10/2012 12:36 PM, glen herrmannsfeldt wrote:
> Vladimir Vassilevsky<nospam@nowhere.com> wrote: > >> The design is trivial: backwards adaptive predictor followed by >> conventional Huffman coder. > >>> I want to deploy these on either end of a packet switched >>> network (coder at server, decoder at client). I.e., they >>> are intended primarily for communication bandwidth reduction. > >> Loseless audio compressor is hardly useful in this scenario, as it does >> not guarantee a fixed bandwidth. > >>> All try to apply different compression strategies which are >>> then encoded in the data stream. > >> The codecs are LOSELESS. Once the session is started, you >> can't drop any data. > > Not very useful in the real world, though. Even back to the > beginning of CDs, there is error concealment when error > correction fails.
I'm not even asking for the decoder to *fix* those errors, dropouts, etc. As long as it can INDICATE where errors exist so that I can take my own remedial action. But, the point of my statement ("different compression strategies") was to elicit comments as to why certain strategies/encodings are preferable to others AND IN WHICH CIRCUMSTANCES. I.e., why isn't a single strategy employed? Or, why only <some_number>?
> There used to be stories (maybe still are) of testing CD players > with a CD with a black wedge on it. The wedge blocks the light > for an ever increasing length of time each revolution. You can > then listen, and see at what point the error correction fails, > and how well the concealment sounds. Thinks like that used to > be in reviews for CD players, but maybe not anymore. > > With VoIP, you never know what will happen on the net, in terms > of delayed or lost packets. Some kinds of concealment is needed.
Yes. Though you can do things to minimize the "discomfort" of those errors -- within reason. E.g., reproducing a signal with periodic dropouts at a "high" frequency (e.g., so the signal sounds to be gated off, often) is probably more annoying than dropping the connection. Or, muting until signal delivery is stable enough for "more pratical" use. If the decoder can indicate problems, "something" can be done to resolve them in a manner that is appropriate to the application.