DSPRelated.com
Forums

Floating Point Data Compression?

Started by DigitalSignal November 6, 2008

DigitalSignal wrote:

> Sorry, I should make it clearer. We tried to find a way to compress > the single precision floating point data streams losslessly. As a > general case, the data acquisition system stores time domain data up > to a few gigabytes. It is expensive to store the data in the portable > device and slow to transfer them.
Do the processing of the raw data in place. Remove the clutter and store only relevant information. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Fri, 07 Nov 2008 14:12:45 -0600, Vladimir Vassilevsky wrote:

> DigitalSignal wrote: > >> Sorry, I should make it clearer. We tried to find a way to compress the >> single precision floating point data streams losslessly. As a general >> case, the data acquisition system stores time domain data up to a few >> gigabytes. It is expensive to store the data in the portable device and >> slow to transfer them. > > Do the processing of the raw data in place. Remove the clutter and store > only relevant information. >
That's actually a pretty good description of any compression algorithm: remove the clutter and save the relevant. What you consider to be clutter vs. relevant has a big effect on whether you use lossless or lossy compression (and what lossy compression algorithm you use); beyond that, how you tell the clutter from the relevant strongly guides the algorithm. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" gives you just what it says. See details at http://www.wescottdesign.com/actfes/actfes.html
On 8 Nov, 05:02, Tim Wescott <t...@justseemywebsite.com> wrote:
> On Fri, 07 Nov 2008 14:12:45 -0600, Vladimir Vassilevsky wrote:
> > Do the processing of the raw data in place. Remove the clutter and store > > only relevant information. > > That's actually a pretty good description of any compression algorithm: > remove the clutter and save the relevant. > > What you consider to be clutter vs. relevant has a big effect on whether > you use lossless or lossy compression (and what lossy compression > algorithm you use); beyond that, how you tell the clutter from the > relevant strongly guides the algorithm.
There is another aspect which maybe happens only on rare occasions, but is very relevant when it does: The economy involved in acquiring the data and the psychology of the responsible decision-makers. These days memory is cheap (a couple of 100$ per TeraByte of disk space) but that was not the case in the mid '90s. A friend of mine wrote his MSc thesis on efficient compression and storage of seismic data. This was the time when the company we both worked with installed the 2nd TByte disk system nation-wide, and the guys brought back truck-loads of Exabyte tapes (each of which stored 5-10 GByte of data and took two hours to load) after they had been to sea. So the logistics of data handling and storage was a big deal at the time. My friend did a good job with lossy compression. He stored the essentials contained in the data in far less space than was needed for the uncompressed data. He worked closely with the data procesors, so every trick he introduced in his storage scheme was evaluated for effects both on the storage and on the seismic images that were processed from the reconstructed data. As far as I could tell, the processors were able to get the same from the compressed data as from the original data. But my friend never recieved much interest for the method. Lots of people held the opinion that "We've spent tens, maybe hundreds, of millions of $$$ collecting these data. We will not do anything that might compromise their present usefulness and future value." Which is a perfectly understandable way of looking at things. In fact, I think I agree with it. Rune
DigitalSignal schrieb:
> Sorry, I should make it clearer. We tried to find a way to compress > the single precision floating point data streams losslessly. As a > general case, the data acquisition system stores time domain data up > to a few gigabytes. It is expensive to store the data in the portable > device and slow to transfer them.
I wonder what kind of portable device generates floating point data. Is there no ADC or is there FP-based processing which results you'd like to store? Hendrik vdH

Hendrik van der Heijden wrote:

> DigitalSignal schrieb: > >> Sorry, I should make it clearer. We tried to find a way to compress >> the single precision floating point data streams losslessly. As a >> general case, the data acquisition system stores time domain data up >> to a few gigabytes.
Probably because of the excessive oversampling and improper gain scaling. Perhaps the data size can be reduced by several times.
>> It is expensive to store the data in the portable >> device and slow to transfer them. > > > I wonder what kind of portable device generates floating point data. > Is there no ADC or is there FP-based processing which results you'd > like to store?
Good point. On another note: it is generally impossible to do any floating point operation loselessly. So, the first step would be the denormalization to integers; then some predictive algorithm, then Huffman coding. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Hendrik,

CoCo-80 (www.go-ci.com) reaches 130~150dB in general. If we just
conduct a simple data acquisition task, we will not convert it into
floating point so the compression can be done in fixed point format.
The issue is that with all kinds of filtering and spectral analysis
process, the data manipulation gets very complicated. We would rather
to keep all the data in single-precision floating point. This is a
practical matter.

James
www.go-ci.com
On Fri, 07 Nov 2008 00:10:26 -0600, Tim Wescott
<tim@justseemywebsite.com> wrote:

>On Thu, 06 Nov 2008 14:19:42 -0800, DigitalSignal wrote: > >> Hi there, A quick question: Is there any way to compress the single >> point floating point data? Apparently most of the research and >> development work focuses on fixed point compression. >> >> James >> www.go-ci.com > >Yes. Set all the values in your vector to zero. Then transmit the >number of samples in your vector. > >Clarify your question and maybe you'll get a meaningful answer. > >Lossy? Lossless? Any specific type of input data, such as still >pictures, video, generic audio or voice? There are any number of lossy >compression algorithms that are just as meaningful with floating point >data as the source stream as with fixed point; but if you're talking >lossless compression then you're pretty much down to the algorithms you >find in zip, and their aunts, uncles, cousins and in-laws.
Actually, there are newer lossless audio compression algorithms that provide much improved compression for digital signals (audio, seismic data, and similar signals from a digitized transducer) over the standard data-processing compression algorithms used in .zip and similar formats. Such legacy algorithms bareley give 10 percent compression or so on such signal data, hardly better than compressing a random file. Newer algorithms rely on the "signal" nature of the data (the fact that succesive samples are highly correlated rather than being a string of random data) for their compression and can give up to 50 percent lossless compression. One of the DVD encoding methods is basically ADPCM with error bits included in the data so that each sample is perfectly recreated on decompression. Read here, especially the Modeling and Residual Coding parts: http://flac.sourceforge.net/documentation_format_overview.html And here, under Comparisons: http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec "FLAC is specifically designed for efficient packing of audio data, unlike general lossless algorithms such as ZIP and gzip. While ZIP may compress a CD-quality audio file by 10 - 20%, FLAC achieves compression rates of 30 - 50% for most music, with significantly greater compression for voice recordings." After seeing the most recent post from the OP and looking at the website http://www.go-ci.com/ it seems to me this sort of compression is exactly the thing for a device storing data coming from a 24-bit A/D. Not sure how a floating-point DSP would handle that, though.
On 15 nov, 05:12, Ben Bradley <ben_u_brad...@etcmail.com> wrote:
> On Fri, 07 Nov 2008 00:10:26 -0600, Tim Wescott > > > > <t...@justseemywebsite.com> wrote: > >On Thu, 06 Nov 2008 14:19:42 -0800, DigitalSignal wrote: > > >> Hi there, A quick question: Is there any way to compress the single > >> point floating point data? Apparently most of the research and > >> development work focuses on fixed point compression. > > >> James > >>www.go-ci.com > > >Yes. &#4294967295;Set all the values in your vector to zero. &#4294967295;Then transmit the > >number of samples in your vector. > > >Clarify your question and maybe you'll get a meaningful answer. > > >Lossy? &#4294967295;Lossless? &#4294967295;Any specific type of input data, such as still > >pictures, video, generic audio or voice? &#4294967295;There are any number of lossy > >compression algorithms that are just as meaningful with floating point > >data as the source stream as with fixed point; but if you're talking > >lossless compression then you're pretty much down to the algorithms you > >find in zip, and their aunts, uncles, cousins and in-laws. > > &#4294967295; &#4294967295;Actually, there are newer lossless audio compression algorithms > that provide much improved compression for digital signals (audio, > seismic data, and similar signals from a digitized transducer) over > the standard data-processing compression algorithms used in .zip and > similar formats. Such legacy algorithms bareley give 10 percent > compression or so on such signal data, hardly better than compressing > a random file. Newer algorithms rely on the "signal" nature of the > data (the fact that succesive samples are highly correlated rather > than being a string of random data) for their compression and can give > up to 50 percent lossless compression. > > &#4294967295; &#4294967295;One of the DVD encoding methods is basically ADPCM with error bits > included in the data so that each sample is perfectly recreated on > decompression. > > &#4294967295; &#4294967295;Read here, especially the Modeling and Residual Coding parts:http://flac.sourceforge.net/documentation_format_overview.html > > &#4294967295; &#4294967295;And here, under Comparisons:http://en.wikipedia.org/wiki/Free_Lossless_Audio_Codec > "FLAC is specifically designed for efficient packing of audio data, > unlike general lossless algorithms such as ZIP and gzip. While ZIP may > compress a CD-quality audio file by 10 - 20%, FLAC achieves > compression rates of 30 - 50% for most music, with significantly > greater compression for voice recordings." > > &#4294967295; &#4294967295;After seeing the most recent post from the OP and looking at the > websitehttp://www.go-ci.com/&#4294967295;it seems to me this sort of compression > is exactly the thing for a device storing data coming from a 24-bit > A/D. Not sure how a floating-point DSP would handle that, though.