Does a time-based audio compression algorithm exist?Started by 6 years ago●12 replies●latest reply 6 years ago●369 views
This may sound like an odd question, but I was wondering if there was such a thing as a time-based #audio compression algorithm?
What I mean by a "time-based audio compression algorithm" is a compression algorithm which compresses on the time scale (x-axis), rather than the dynamic range (y-axis).
For instance, when compressing a 32 bit audio file to 16 bit, I assume most compression algorithm simply reduce the dynamic range and chop off (or limit) any audio over 96.33 db (the 16 bit db maximum).
But does such a thing exist on x-axis (for an audio wave)?
For instance, let's say we had an audio file consisting of a simple sine-wave:
I envision this algorithm understanding that the sine-wave consisted of repetitions of the same audio signal and (like an image compression algorithm) would reduce an audio wave (see left-hand audio wave) to its most unique form (see right-hand audio wave).
So for example, from an image perspective, the following image:
...would be compressed into the following image:
Note: Keep in mind that the compression algorithm would have to keep track of the mapping from the compressed image to the uncompressed image, in order to re-generate the uncompressed image (during a decompression).
If I had to guess, this algorithm would be like an inverse de-noiser.
Any help would be greatly appreciated.
It seems to me that tying the idea (or the objective) to a single sinusoid rather misses the point of the approach being more general. "Time compression" might be viewed as a reduction in the sample rate - which has lots of supporting work.
As a related notion, speech compression is done by extracting critical elements and coding them so as to take a lot less bandwidth. This seems similar to what you want. e.g. take a sinusoid or something more complicated and extract the pitch and amplitude. Then re-create based on pitch and amplitude. Again, there is lots of supporting work on this.
Thank you for your input.
Can I assume that all of the examples you've mentioned (regarding sample-rate reduction and speech compression) are lossy compression methods?
In other words, if we were to compress the phrase:
...spoken by some person (let's say myself), that the sound of that phrase before compression would sound different than the phrase after being compressed and compressed?
In other words, the speech compression algorithm would know what words were being spoken (using the fundamental frequency for each word / note in a phrase), but would not be able to retain the timbre of those compressed words (ex. the sound of my voice)?
The same applies to sample-rate reduction, can I assume that once we've reduced the sample-rate of some audio signal, that the data lost during the sample-reduction can never be recovered exactly (but can likely be approximated)?
Your last question implies that you want lossless compression. Yes, of course, these methods aren't strictly lossless... That is, if one assumes that the waveform has anything interesting in it, then it's changing. If it's changing then it will have spectral components that will be lost (in theory at least) if you reduce the sample rate OR if you do parameter extraction / reconstruction. Both approaches are intended to reduce the bandwidth. And, it might be said that reduction in bandwidth is what allows a reduced sample rate (without undue aliasing, etc.). Anyway, only you can decide how much bandwidth reduction would be acceptable. There is *always* some but systems are designed to allow it / to make it acceptable.
The speech compression works way better than what you describe and is in daily use throughout the world. I don't think we'd say that the timbre of our voices is being overly harmed - just sometimes :-)
You can have lossless compression of course. But the effectiveness (i.e. the compression ratio) depends on the sparsity of the original signal/pattern.
As a matter of history, I once did a project that used repetitive records. We had been using an endless-loop tape recording and decided to apply newer digital memory technology. Because the digital memories were small compared to the tape loop, the "loop" was shorter. Either way, the result is periodic. But with the shorter memory, the period was shorter and the periodicity was evident in the output. You may have not been able to hear it but a spectrum analyzer showed that is was made up of spectral lines instead of being more "continuous" in frequency. It was rather obviously "synthetic". That's what will happen if you try to use a short record to represent what should be a long record.... Just food for thought here.
I think you don't mean downsampling but rather finding a time unit (interval) where a pattern repeats then send pattern only and tell Rx start/end of pattern.
Obviously you need to transmit information that is changing and so pattern has to be dynamic. such pattern will be sensitive to phase of various components and will be too complicated to generate/recreate but with some prayer anything could be possible unless proved otherwise.
There is something similar in linear predictive coding (LPC or ADPCM). here you try to estimate the next sample by a linear combination of past samples. Sometimes these codecs contain something that is called long term prediction - That means you try to find repating patterns in the time domain and subtract them from your current syamples in order to get the amplitude down.
Something similar is also Codebook Excited Compression. Here a snippet of audio is compred to a bunch of standard waveforms from a codebook and the closest one is chosen and only the index of that snipped is stored/transmitted. Some variants also use codebook + stretching + gain...
Thank you for the input.
The use of estimation seems interesting, though I'd be interested how well something like that would work with something like room tone / noise.
Still it's something worth thinking about.
I think you've hit the nail on the head.
Data that we can't compress, is usually considered "noise" by most compression algorithms.
For example, consider a 100 x 100 pixel image consisting entirely of black pixels, except for a lone white pixel.
The lone white pixel would be considered the noise in this example.
In order for an audio compression algorithm to compress (on the time-scale) an audio signal without losing any data during the compression, the audio signal itself would need to be trivial.
But if the audio signal was non-trivial (let's say it consisted of two people having a conversation in a crowded bar), I'm doubtful if an audio compression algorithm (if it exists) would be able to successfully compress the audio signal without heavily degrading the audio in the process.
Don't you think Fourier analysis is precisely about such cases?
Given [enough samples of] a periodic signal, there is a usually simpler representation of it in Fourier domain. For instance, in the idealized example you gave, the samples of the sine wave would be compressed into two floating points: one for the frequency and one for its amplitude.
Can I assume that the following statement:
"Given [enough samples of] a periodic signal, there is a usually simpler representation of it in Fourier domain."
...only pertains to somewhat trivial audio signals?
In other words, the Fourier analysis of a single tone (ex. A @ 440 Hz) compared to an audience of people cheering, would work far better / be a more accurate analysis with a single tone than with the audience of people cheering?
Or is it simply that we would merely need to provide more sample information in order to *accurately perform a Fourier analysis on the audience of people cheering compared to the single tone?
*Note: When I say "accurately", I mean 100% accurate rather than simply an approximation.
If you're talking about domain-specific applications and excluding methods that have a popular frequency domain interpretation (such as LPC/LSF), a dirty trick I once used to compress speech is to take the time-domain (1st or 2nd) derivative and 7zip the file. It worked surprisingly well on periodic, noise-free signals, being almost on par with flac, because the derivative operation increases data repetition, which are taken care of by 7z. I don't expect this method to work even in a slightly more generalized case though.
But hey, you can always pass signals into an auto-encoding neural network and get some approximated reconstruction on the other end. When this is used in a recursive manner, it's essentially a non-linear version of LPC/ARMA modeling.
Out of curiosity, what would happen if the audio signal wasn't noise free?
Would the algorithm completely fall apart, or would it work but only less efficiently?
I ask as I'm dealing with working extremely noisy signals.
It won't work at all on noisy signal for there's not much periodicity to begin with.
AFAIK, there's not so much entirely time-domain compression algorithms for noisy signals, especially if you want exact reconstruction. If the requirement can be lessened, you probably want to look into some vector-quantization technique.