DSPRelated.com
Forums

Human hearing instataneous dynamic rage?

Started by Richard Owlett August 21, 2008

Richard Owlett wrote:

> > > > This outgrowth of an offshoot of a general interest in problems related > to speech recognition. The only budget I have for this is my time and > access to the Web. My current project is representing > time/frequency/intensity of sound in 3D - the spectrograms that are > typically used just don't "work" for me.
I don't know how you define "work for me" exactly. You are interested in making a visual display of your frequency vs time data. So the issue is the dynamic range of your visual capabilities more than the range of hearing. You could display it as greyscale image where light and dark are used to represzent magnitudes. If you did that you would be converting the data to 8 bits, but in reality your eyes can only distinguish about 100 levels of gray at the most. So storing the data as 16 bits is way more than you need for that type of display. If you plot the data as a 3d surface that certainly will increase the range of what you perceive. A dynamic range of 16 bits would mean that if the largest feature in your display were bigger than a house then the smallest could be smaller than a grain of sand. -jim
> > The purpose of this round of questions was to get some idea of how to > scale the plot to be both "pleasing" and useful. My current idea is to > experiment with plotting the data on a linear scale with contours > displayed at logarithmic intervals.
----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==---- http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups ---= - Total Privacy via Encryption =---
jim wrote:
> > Richard Owlett wrote: > > >>This outgrowth of an offshoot of a general interest in problems related >>to speech recognition. The only budget I have for this is my time and >>access to the Web. My current project is representing >>time/frequency/intensity of sound in 3D - the spectrograms that are >>typically used just don't "work" for me. > > > I don't know how you define "work for me" exactly. You are interested > in making a visual display of your frequency vs time data. So the issue > is the dynamic range of your visual capabilities more than the range of > hearing.
NO
> > You could display it as greyscale
*NO* That's the major problem I have spectrograms. image where light and dark are used
> to represzent magnitudes. If you did that you would be converting the > data to 8 bits, but in reality your eyes can only distinguish about 100 > levels of gray at the most. So storing the data as 16 bits is way more > than you need for that type of display. > If you plot the data as a 3d surface
Who said anything about a _surface_?
> that certainly will increase the > range of what you perceive. A dynamic range of 16 bits
So just where did "16 bits" magically come from? Subject is "Human hearing dynamic range". Quoting my original post: "Subject line probably poorly stated. When dynamic range of human ear is discussed it's usually comparing threshold of pain to weakest detectable sound. I'm more interested in comparing a loud and soft sound being distinguished at the same time."
> would mean that > if the largest feature in your display were bigger than a house then the > smallest could be smaller than a grain of sand. > > -jim > > >>The purpose of this round of questions was to get some idea of how to >>scale the plot to be both "pleasing" and useful. My current idea is to >>experiment with plotting the data on a linear scale with contours >>displayed at logarithmic intervals. > > > > ----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==---- > http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups > ---= - Total Privacy via Encryption =---

Richard Owlett wrote:
> > jim wrote: > > > > Richard Owlett wrote: > > > > > >>This outgrowth of an offshoot of a general interest in problems related > >>to speech recognition. The only budget I have for this is my time and > >>access to the Web. My current project is representing > >>time/frequency/intensity of sound in 3D - the spectrograms that are > >>typically used just don't "work" for me. > > > > > > I don't know how you define "work for me" exactly. You are interested > > in making a visual display of your frequency vs time data. So the issue > > is the dynamic range of your visual capabilities more than the range of > > hearing. > > NO > > > > > You could display it as greyscale > > *NO* That's the major problem I have spectrograms.
What is?
> > image where light and dark are used > > to represzent magnitudes. If you did that you would be converting the > > data to 8 bits, but in reality your eyes can only distinguish about 100 > > levels of gray at the most. So storing the data as 16 bits is way more > > than you need for that type of display. > > If you plot the data as a 3d surface > > Who said anything about a _surface_?
Yes right you said contours this time. The point remains the same 16 bits is enormous range for visualization. It's going to generally look not much different than 8 bits.
> > > that certainly will increase the > > range of what you perceive. A dynamic range of 16 bits > > So just where did "16 bits" magically come from? > Subject is "Human hearing dynamic range".
Same place 8 bits came from - these are standard sizes for computer data. I assumed you are using standard computer equipment making it extremely unlikely you will be viewing the spectrum with some sort of display format of 5 or 11 bits or whatever. It doesn't seem like it matter how great the dynamic range of the audio equipment if you are viewing in 8 bits which is already more than your eyes can see. Also speech recordings in 8 bits can be quite clear provided the sample rate is not too slow. I have heard pretty good 4 bit recordings of human speech. -jim
> > Quoting my original post: > "Subject line probably poorly stated. > When dynamic range of human ear is discussed it's usually comparing > threshold of pain to weakest detectable sound. > > I'm more interested in comparing a loud and soft sound being > distinguished at the same time." > > > would mean that > > if the largest feature in your display were bigger than a house then the > > smallest could be smaller than a grain of sand. > > > > -jim > > > > > >>The purpose of this round of questions was to get some idea of how to > >>scale the plot to be both "pleasing" and useful. My current idea is to > >>experiment with plotting the data on a linear scale with contours > >>displayed at logarithmic intervals. > > > > > > > > ----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==---- > > http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups > > ---= - Total Privacy via Encryption =---
----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==---- http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups ---= - Total Privacy via Encryption =---
jim wrote:
> > Richard Owlett wrote: > >>jim wrote: >> >>>Richard Owlett wrote: >>> >>> >>> >>>>This outgrowth of an offshoot of a general interest in problems related >>>>to speech recognition. The only budget I have for this is my time and >>>>access to the Web. My current project is representing >>>>time/frequency/intensity of sound in 3D - the spectrograms that are >>>>typically used just don't "work" for me. >>> >>> >>> I don't know how you define "work for me" exactly. You are interested >>>in making a visual display of your frequency vs time data. So the issue >>>is the dynamic range of your visual capabilities more than the range of >>>hearing. >> >>NO >> >> >>> You could display it as greyscale >> >>*NO* That's the major problem I have spectrograms. > > > What is?
*GREYSCALE DISPLAY!!!!!*
> > >> image where light and dark are used >> >>>to represzent magnitudes. If you did that you would be converting the >>>data to 8 bits, but in reality your eyes can only distinguish about 100 >>>levels of gray at the most. So storing the data as 16 bits is way more >>>than you need for that type of display. >>> If you plot the data as a 3d surface >> >>Who said anything about a _surface_? > > > Yes right you said contours this time. The point remains the same 16 > bits is enormous range for visualization. It's going to generally look > not much different than 8 bits.
You keep reading in things that aren't there. Or perhaps you come equipped with a bionic ear. The subject is *human resolution* *NOT* machine representation!
> > >>>that certainly will increase the >>>range of what you perceive. A dynamic range of 16 bits >> >>So just where did "16 bits" magically come from? >>Subject is "Human hearing dynamic range". > > > Same place 8 bits came from - these are standard sizes for computer > data. I assumed you are using standard computer equipment making it > extremely unlikely you will be viewing the spectrum with some sort of > display format of 5 or 11 bits or whatever. It doesn't seem like it > matter how great the dynamic range of the audio equipment if you are > viewing in 8 bits which is already more than your eyes can see. > > Also speech recordings in 8 bits can be quite clear provided the sample > rate is not too slow. I have heard pretty good 4 bit recordings of human > speech. > -jim > > > >>Quoting my original post: >>"Subject line probably poorly stated. >>When dynamic range of human ear is discussed it's usually comparing >>threshold of pain to weakest detectable sound. >> >>I'm more interested in comparing a loud and soft sound being >>distinguished at the same time." >> >> >>>would mean that >>>if the largest feature in your display were bigger than a house then the >>>smallest could be smaller than a grain of sand. >>> >>>-jim >>> >>> >>> >>>>The purpose of this round of questions was to get some idea of how to >>>>scale the plot to be both "pleasing" and useful. My current idea is to >>>>experiment with plotting the data on a linear scale with contours >>>>displayed at logarithmic intervals. >>> >>> >>> >>>----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==---- >>>http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups >>>---= - Total Privacy via Encryption =--- > > > > ----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==---- > http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups > ---= - Total Privacy via Encryption =---
On Sun, 24 Aug 2008 05:58:25 -0500, Richard Owlett
<rowlett@atlascomm.net> wrote:

> ...
>This outgrowth of an offshoot of a general interest in problems related >to speech recognition.
Speech recognition by machine, or by human?
> The only budget I have for this is my time and >access to the Web. My current project is representing >time/frequency/intensity of sound in 3D - the spectrograms that are >typically used just don't "work" for me.
The common name for this (if for some strange reason you haven't heard it) is a waterfall plot. There should be plenty of info on that. IIRC, Microsoft Excel can create such a plot.
> >The purpose of this round of questions was to get some idea of how to >scale the plot to be both "pleasing" and useful. My current idea is to >experiment with plotting the data on a linear scale with contours >displayed at logarithmic intervals.
Ben Bradley wrote:
> On Sun, 24 Aug 2008 05:58:25 -0500, Richard Owlett > <rowlett@atlascomm.net> wrote: > > >>... > > >>This outgrowth of an offshoot of a general interest in problems related >>to speech recognition. > > > Speech recognition by machine, or by human?
The fascination covers both. Back in the early I took an introductory linguistics course. The class was primarily Yanks. There were two guys from Dixie. The instructor had them say "pin" and "pen". They and the instructor were only ones in room who could distinguish the difference. My current interest is the signal path (including physical environment) from vocal tract to data bus. I don't get into the semantic decoding at all.
> > >>The only budget I have for this is my time and >>access to the Web. My current project is representing >>time/frequency/intensity of sound in 3D - the spectrograms that are >>typically used just don't "work" for me. > > > The common name for this (if for some strange reason you haven't > heard it) is a waterfall plot. There should be plenty of info on that. > IIRC, Microsoft Excel can create such a plot.
Actually that's where I started. I spent hour in front of a RF spectrum analyzer in 70's. It's now the only way I think of a spectrum. Waterfall plots I've seen don't allow rotating to view whichever feature catches my interest. I use Scilab's param3d1(). It allow plotting with points rather than lines. I've been experimenting with using contour() in conjunction with it. The result are contours of equal amplitude hanging in 3D space. It has the advantage that I can look at it in 3D while someone used to spectrograms can rotate it and look down on the time-frequency plane and see a color spectrogram. I normalize my data to max of all the FFT's in that experiment. That makes the largest features clear when plotted on a linear scale. Plotting to a log scale makes the small features also visible at the cost of *CLUTTER*. The purpose of my question was to try to come up a threshold below which not to plot a value. Scilab allows setting a variable to %nan ("Not A Number") and all plot routines will ignore it. If it is an element of vector is %nan, all calculations using that element are set to %nan with out causing errors.
> > >>The purpose of this round of questions was to get some idea of how to >>scale the plot to be both "pleasing" and useful. My current idea is to >>experiment with plotting the data on a linear scale with contours >>displayed at logarithmic intervals. > >

Richard Owlett wrote:

> >>> If you plot the data as a 3d surface > >> > >>Who said anything about a _surface_? > > > > > > Yes right you said contours this time. The point remains the same 16 > > bits is enormous range for visualization. It's going to generally look > > not much different than 8 bits. > > You keep reading in things that aren't there. > Or perhaps you come equipped with a bionic ear. > The subject is *human resolution* *NOT* machine representation! > >
The subject line didn't make any sense to me. The body of the post I responded to asked specifically about machine representation. Here is what I read that I was responding to: My current project is representing time/frequency/intensity of sound in 3D - My current idea is to experiment with plotting the data on a linear scale with contours displayed at logarithmic intervals. I assumed the representation to which you referred would be done on a computer. Are you saying that isn't true? -jim ----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==---- http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups ---= - Total Privacy via Encryption =---
On Sun, 24 Aug 2008 14:45:10 -0500, Richard Owlett
<rowlett@atlascomm.net> wrote:

>jim wrote: >> >> Richard Owlett wrote: >> >> >>>This outgrowth of an offshoot of a general interest in problems related >>>to speech recognition. The only budget I have for this is my time and >>>access to the Web. My current project is representing >>>time/frequency/intensity of sound in 3D - the spectrograms that are >>>typically used just don't "work" for me. >> >> >> I don't know how you define "work for me" exactly. You are interested >> in making a visual display of your frequency vs time data. So the issue >> is the dynamic range of your visual capabilities more than the range of >> hearing. > >NO > >> >> You could display it as greyscale > >*NO* That's the major problem I have spectrograms.
Okay, you could display each succesive increase in volume as a different color, perhaps using the standard resistor color code: 0 black 1 brown 2 red 3 orange 4 yellow 5 blue 6 green 7 violet 8 gray (or grey for the other side of the pond) 9 white And for values of 10 or above it would just "wrap around." This would make it easy to distinguish between adjacent levels, but you'll misinterpret things when levels change in larger steps between adjacent FFT bins such as 27, 38, 54. You're looking at frequency vs. amplitude vs. time. I suspect one reason you're not seeing what you want is the length of the FFT. If it's long then it will smear higher frequency transients. Now that I think about it, it will smear all transients. The FFT displays things as if they were steady-state signals present for the whole duration of the window. Another thing is the window used for the FFT. I recall that different windowing functions optimize for different things (for example, more accurate amplitude measurent vs. more accurate frequency measurement). You apparently want to optimize distinguishing between different frequencies (have a sharper slope for a displayed frequency, I forget what that's called, perhaps a "steeper skirt"). I forget what windows do what, but choosing the right window can make a dramatic difference over the wrong one.
> > > image where light and dark are used >> to represzent magnitudes. If you did that you would be converting the >> data to 8 bits, but in reality your eyes can only distinguish about 100 >> levels of gray at the most. So storing the data as 16 bits is way more >> than you need for that type of display. >> If you plot the data as a 3d surface > >Who said anything about a _surface_?
It's a 3d image, so it effectively has a surface. But then I'm not sure if you want color or vertical height to represent amplitude or what.
> >> that certainly will increase the >> range of what you perceive. A dynamic range of 16 bits > >So just where did "16 bits" magically come from? >Subject is "Human hearing dynamic range".
Try to work with us, both Jim and I are trying to help you, and you're being a bit cantankerous. Over what range of values would you be displaying for amplitude? 20? 500? 50,000?
> >Quoting my original post: >"Subject line probably poorly stated. >When dynamic range of human ear is discussed it's usually comparing >threshold of pain to weakest detectable sound. > >I'm more interested in comparing a loud and soft sound being >distinguished at the same time."
And especially that these different sounds might be near in frequency and one much louder than the other, the FFT length and windowing function are critically important.
caveat lector
lingua in letifico ;)

Engineers read? I doubt.
Let's see if they can be lead on a parsing trail, even if English ~BNF.

Lets parse the original subject line, "Human hearing instataneous 
dynamic rage?"

The first word is "human". That can be used as a noun or an adjective.
The second is "hearing". That can be used as a noun or an adjective.
The third is "instataneous". Missing from dictionary but resembles
     "instantaneous", an adjective.
The fourth is "dynamic". That is an adjective.
The last is "rage". That's a noun but why use "rage" in an on-topic DSP
     post. Body of post refers to "range". Another typo.


Parse on.
In English, four adjectives modifying one noun - unlikely.
Noun Noun would be strange/awkward.
Adjective Noun Adjective Adjective Noun seems a likely construct.

As this is comp.dsp, with with many audio types lurking, it's unlikely 
that "hearing" is a law reference. The primary topic evidently concerns 
how humans hear.

Now to the second phrase. "Dynamic range" is a common term and meaning 
seems clear. But it is modified by "instantaneous". "Instantaneous" and 
"dynamic" just aren't commonly used together. Red flag raised.

The point is explicitly clarified in the first sentence of the second 
paragraph by contrasting
       "I'm more interested in comparing a loud and soft sound being
        distinguished at the same time."
to "dynamic range of human ear" being comparison of "threshold of pain" 
to "weakest detectable sound".


I was offered key words psychoacoustics, lossy compression methods, and 
masking. These proved useful for Google and Wikipedia searches. I also 
was given some historical background on measurement/reception of 
distortion in audio systems which brought to mind things in my general 
background. The result is that I now know that what I'm looking for will 
be under a heading related to "audio masking". I suspect the number I'm 
looking for will be in vicinity of 20-30 dB.


I was _THEN_ asked the purpose of my question.
It is to devise a scaling procedure for a *3D* representation of 
intensity vs frequency vs time. I commented, as an aside, that I had 
found *2D* representations (aka spectrograms ) unsatisfactory.

So I was then hit with methods of possibly improving 2D methods in which 
I have no reason to be interested. Down hill from there.












Ben Bradley wrote:
> [snip OT discussion of displaying in 2D] > > You're looking at frequency vs. amplitude vs. time. I suspect one > reason you're not seeing what you want is the length of the FFT. If > it's long then it will smear higher frequency transients. Now that I > think about it, it will smear all transients. The FFT displays things > as if they were steady-state signals present for the whole duration of > the window.
NO My FFT's (NOTE BENE the plural) cover up to tens of seconds. Currently I'm using 10 mSec windows. Why I don't see features is *STRICTLY* _AND_ *EXPLICITLY* a representation issue. There a large items. There are small items. I want to see details of each. Now a foot high object on a mountain may not be significant. But a foot deep hole in your front walk may be. I want to see both in a single display. The typical approach is a log plot. Not too bad for large features. Small features can be seen. *BUT* irrelevantly small features also become *CLUTTER* My input data may be 16 bit PCM, but my calculations are done in floating point with at least a 10^16 dynamic range. Obviously I can discard any points that are 2^16 smaller than my largest. The question then becomes "does the system being investigated raise the smallest significant value more?"
> [snip] > > Try to work with us, both Jim and I are trying to help you, and > you're being a bit cantankerous. > > Over what range of values would you be displaying for amplitude? > 20? 500? 50,000?
See above ;) Signed The not-so-cantankerous OP
Richard Owlett wrote:

   ...

> The fourth is "dynamic". That is an adjective.
Parse "There was a favorable group dynamic." ... Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;