Is anyone familiar with the mathematics / theory behind Convolution Reverb, especially in regards to frequency?Started by 7 years ago●25 replies●latest reply 7 years ago●1151 views
I am a Computer Science student utilizing Convolution Reverb in my master's thesis, and was wondering if the following assertions were correct.
Assuming the following example of a sound wave emanating from an amplifier while reflecting off of 4 different surfaces (wooden shelves, plaster ceiling, plastic dish washer and tile floor), before arriving at a microphone:
1. More or less of the sound wave will be absorbed upon reflection off a surface, depending on both the frequency of the sound wave (pre-reflection) and the material of the surface (rather than the surface material alone).
In other words, a 440 Hz (A4) sound wave may be absorbed / drop by 15%, after reflecting off a wooden surface.
Whereas a 392 Hz (G4) sound wave may be absorbed / drop by 24%, after reflecting off of the same wooden surface.
2. When a portion of a sound wave is absorbed upon reflection off a surface, the sound wave becomes a new sound wave / changes in pitch.
In other words, if a 440 Hz sound wave is absorbed by 15% after reflecting off of the first surface (a wooden shelve), the sound wave changes in pitch to 440 - (440 * 0.15) = 374 Hz.
Note: I understand that a 15% drop is an unrealistically large amount, but I'm merely using large numbers for demonstration purposes.
3. When a sound wave drops in frequency (due to being partially absorbed after reflecting off a surface), the sound wave increases in speed.
In other words, if a 440 Hz sound wave wavelength is 78.41 centimeters in length, a 374 Hz sound wave is longer in length, and thereby reaches its final destination more quickly.
That said, I might have this reversed, and it could really be that a longer sound wave takes longer to reach its final destination.
If all of these assertions are correct, would anyone happen to know the mathematical equation for convolution reverb?
All I've been able to find on the web are various approximations to calculating convolution reverb (ex. Schroeder/Moorer Reverb Model), rather than a pure / non-approximate equation.
Keep in mind that I'm not attempting to code anything. I'm just trying to understand where frequency comes into play from a mathematical perspective, as both myself and my Master's Committee Chair have only been able to model convolution reverb in regards to delay and amplitude changes in time.
However, if convolution reverb was simply delaying / dropping the amplitude of a signal over time, it could easily be replicated by a standard delay plugin, but there's more that's going on. As a sound source recorded in a carpeted environment will sound duller / darker than the same sound source recorded in a cemented environment (due to the role that frequency plays in everything).
My sincerest thanks for any help, and apologies for the long winded question.
I think you need to find another web site to trust. You're getting some basic physics wrong.
1: Yes, different frequencies will be reflected at different amplitudes by any given material. But I would not expect such a dramatic difference as 24% vs. 15% for a frequency difference of only 18%, at least not typically (special cases involving resonant structures or diffraction patterns could always be constructed).
2: No. Just no. Pitch will not change. The character of the sound will change because of the frequency-dependent reflection, which, since it will be acting on overtones, will potentially have a pretty dramatic effect. But without some serious nonlinearities (no) or Doppler (no, unless your kitchen is exploding), you wouldn't see pitch changes. An original sound with a significantly un-musical overtone structure might undergo an apparent pitch change, however.
3: https://en.wikipedia.org/wiki/Speed_of_sound. "The speed has a weak dependence on frequency and pressure in ordinary air..." The speed variation with frequency is almost certainly negligible, and if it matters at all it matters far more for the time-of-arrival of the overtones than of the fundamental.
I don't, unfortunately, know where to send you for more information. Basically the reverberation is going to be some transfer function (to add frequency dependency) plus delay for each surface. I don't know, but wouldn't be surprised, if there's not also a frequency-dependent directional characteristic; i.e., a given surface may reflect low-frequency sounds but scatter higher-frequency sounds; and those higher-frequency sounds would then potentially bounce off of various other bits of the room before coming to the microphone.
Hopefully someone else on this group is more involved in acoustics than I am -- I tend to alternate between amusement and bemusement when things get seriously audiophile.
Very slight "nit-pick" on Tim Wescott's response where he says "surprised, if there's not also a frequency-dependent directional
characteristic; i.e., a given surface may reflect low-frequency sounds
but scatter higher-frequency sounds;"
My own expectation would be that low-frequency sounds (long wavelength) would scatter, whereas high frequency sounds (short wavelength) would reflect. So in this one instance I think Tim got it reversed. Scattering vs reflection is going to be due to the relative size of the acoustic wavelength compared with the size of the object in the path. Think of ocean waves striking a piling (they scatter) vs striking a cliff-face (where they reflect). You can see this in looking at wave patterns in a enclosed bay.
Other than that, I'm in complete agreement with Tim's posting.
If you put "Convolution reverb" into Google, two "easy reading" sites are:
One final observation... At _very_ high acoustic intensities, air becomes a non-linear medium. This can be used to create audible sounds due to the interaction of two (or more) high-intensity sound waves that cause mixing products due to the non-linearity. But personally I wouldn't want my ears anywhere near such a high-intensity source. That's the only means I'm aware of besides Doppler shift (reflection off of a moving surface) for generating frequencies that weren't in the original sound source.
Sorry Joe, but I'm now going to nit-pick.
I believe Tim is correct. Maybe our definition of reflect and scatter are different, but reflect to me implies bounce off at the "reflected" angle, while scatter means comes off at all sorts of angles.
With this definition, low frequencies are more like to "penetrate" carpet and "reflect" off the under surface, while higher frequencies will be absorbed or "scatter" off the uneven surfaces of the carpet fibers.
My 2 cents.
For a given size of thing, low-frequency sounds would certainly refract more, and regardless of any technical definitions of "scatter" they'd get spread around.
So Joe's nit-pick sounds valid to me. And besides, he said "from experience" -- if he's got concrete evidence that it actually happens, then it's the theorist's job to say why, not to say it ain't so.
Part of my thinking relates to the way in which a recording studio is designed to avoid "modes". The walls are never parallel, the glass window - used by the control room to see in - is never vertical, and the treatment on the walls is set up to "scatter" the reflections. This scattering is achieved with either cones, egg cartons or sometimes strips of wood that protrude from the walls at varying depths. Some treatments are intended to absorb, but some are intended to scatter to break the paths that form modes.
All of this treatment is successful at high frequencies, but the reverberation issues often remain for lower frequencies, because the modes cannot be eliminated for the lower frequencies, suggesting to me, that they are still partially reflected rather than scattered.
At Bell Labs in Murray Hill New Jersey, there is an anechoic chamber that has fiber-glass cones that are 10 ft long, on the walls, floor, ceiling and doors. (The chamber was once listed in the Guinness book of records as the quietest place on earth.) With all of that the room is dead down to about 100 Hz, but below that there is still some minute reverberation.
My point is that the objects intended to scatter the sound have to be very large to get down to taking care of very low frequencies.
One point missed so far is that a reflected acoustic wave can undergo a 180 deg phase reversal like EM waves. See http://physics.stackexchange.com/questions/23847/w...
The sum of the phases of reflected terms for each frequency contribute to the final amplitude and phase of any given frequency at the microphone or pickup device.
I would also suggest, that the absorption vs. reflection coefficient(s) depends on the frequency relative to the size of the texture of the object of reflection. See http://hyperphysics.phy-astr.gsu.edu/hbase/Sound/r...)
Thanks for this reminder.
When I use the RIR software for simulating a room, I assign a positive sign to one and and a negative sign to the other of two opposing surfaces. E.G. floor is positive, ceiling is negative, left wall is positive, right wall is negative, front of the room is positive, back is negative.
Generally, this will give a response with no DC component, which we would not expect to see.
First answer is that you assertions are not correct.
Only the amplitude of the various frequency components is changed by the reflections off the surfaces.
Reflections off the tile on the floor or glass windows of mirrors will be "bright" meaning that any attenuation of the signals is about the same for all frequencies.
Reflections off curtains or carpet tend to be "dull" meaning that high frequencies are attenuated more than lower frequencies.
The only way for the frequency (pitch) to change is for a non-linear process to get involved, which only happens under very undesirable and extraordinary circumstance. This is logical, or else a guitar tuned up by ear in a bathroom (highly reflective) would be out of tune in the living room (soft because of furniture, curtains and carpet).
Second, at least at your level of concern, all frequencies of sound travel at the same speed. The fact that the wavelength is longer or shorter has no effect on the speed, only on the number of cycles of the specific frequency that exist in a certain distance.
As an example, find a long hallway in a building, preferably with smooth walls on each side and a flat surface (door, wall or window) at the end. Snap your fingers or your tongue (very short snap/impulse contains all frequencies) and listen for the reflection off the end wall, and you will hear the snap come back as a snap, meaning all frequencies arrived back at the same time. The hall has to be pretty long for you to hear the delayed response.
Third. You want to look for references on Room Impulse Response (RIR). The Schroder model essentially unfolds all the surfaces of a room, and replicates them as mirror images in repeating patterns laid out onto a flat surface. The source and the sink of the sound appear in all the reflected copies of the room layout. The trick is to define the reflective characteristics of all the places where the sound interacts with a surface.
See diagrams in http://www.umiacs.umd.edu/~ramani/cmsc828d_audio/8...
The overall impulse response of the room is the sum of each of these reflections placed in the right place in time based on distance and having the right attenuation. Remember also that sound has a typical attenuation factor based on an R^2 law, which means that the sound decreases by half the level for every doubling of the distance from the source.
If you are familiar with MatLab, there are examples of code used to model the characteristics of a room assuming simple reflections with no frequency selectivity.
Also take a look at http://www.dspalgorithms.com/room/room25.html I have not used it, but it might give you some ideas.
When the overall RIR model is assembled - through the summation mentioned - it becomes a filter through which the sound has travelled from source to sink. We implement filters using convolution, so I think that is where the term comes from that is used in your original definition of the problem.
I hope this is helpful.
Your questions are very interesting.
My background is with Electromagnetic Waves, aka EM, however feel that some of wave attribute may apply to Sound Waves. In general, Sound waves travel faster in denser materials, which is in contrary to EM.
A1) when a wave gets reflected or scattered by an object, the amplitude of scattered wave depends on the incident angle, the material (which you pointed out), and frequency of the wave. Sound (i.e. which is audible to human) is between 20Hz - 20KHz. The refraction theory of Optics is used extensively in Radar applications. I would not be surprised if the result shows much higher loss of signal for Sound wave, as it is true for EM. You may want to study Hyugens' Principle.
A2) The frequency of reflected Sound wave depends on the radial velocity of the observer.
If an ambulance is coming toward you, the pitch/frequency increases, proportional to the radial velocity (speed) of the ambulance. If it is moving away from you, the frequency decreases. This phenomena is known as Doppler effect which is again used extensively in Radar applications.
A3) Speed of the sound wave depends on the medium that is travelling in. For instance sound wave travels faster in metal than air. Native American used to listen to the ground or railroad for inferring incoming horses or locomotive, respectively.
Hope that helps,
dgshaw6 gave a very good reply to this.
My own experience is in underwater acoustics. You might find "A Conceptual Model of Reverberation in the Ocean" to be of some interest. This is about a reverberation generator approach called REVGEN which was developed at the University of Washington Applied Physics Laboratory. I was interested in doing reverberation generation for complex structures with high speed array processors for multiple transmit and receive beams (as in microphones and beamformers).
The whole idea really revolves around a 3-D "reverberation density matrix" which is constructed as a function of range and solid angle. So, one is able to place a quantized reflector in each element of the matrix and assign an amplitude and a radial velocity to it. Appropriate use of the FFT to do the computations efficiently is part of it.
Assigning velocity of the reflectors is important because often the transmitters and reflectors have relative radial motion. This causes a Doppler shift. Otherwise, as dgshw6 said, there is no change in frequency and certainly not in the speed of sound. In fact, the wavelength is derived from the speed of sound just like for radio waves and the speed of light.
But then this isn't a closed form non-approximate equation approach.
If you look up "convolution" in wikipedia, you will most likely find the formula for discrete convolution (convolution on samples) to be exactly what I pointed out above.
If you'd dig into the signal processing you'd find out, that having different amplitudes for different delays will in the end attenuate certain frequencies and amplify others.
Mathematically a convolution reverb is a very simple thing. It is easier to implement than any other reverb form. The only problem is, that the standard implementation is computationally very expensive. The art is to build a convolution reverb that is easy on the resources.
In a convolution reverb you take a measured impulse response of the system you want to model and take the samples of this impulse response as amplitude factors for your sample delays. In other words: You convolve the input signal with the impulse response.
I have made convolution reverbs - if you do it in a straight forward fashion, the convolution algorithm takes about two to four lines of c-code.
Out of curiosity, do you agree with the following definition of convolution reverb (http://dsp.stackexchange.com/questions/4723/what-is-the-physical-meaning-of-the-convolution-of-two-signals):
Especially the part where the person states:
"You can think of the output y(t) as the sum of an infinite number of copies of the impulse response, each shifted by a slightly different time delay (r) and scaled according to the value of the input signal at the value of t, that corresponds to the delay: x(r)".
Also, would you say it's somewhat trivial to derive an IR (impulse response) from a clean signal and reverberated signal (using the IR)?
In other words, if I had an IR and a dry signal, and I gave you the dry signal and the reverberated signal (i.e. IR(dry signal)), would you be able to derive the IR, exactly?
What you are after is related to a fairly standard system identification problem. Very often this identification is made using an adaptive filter. Echo cancellation falls into this field or category.
There are however some serious limitations:
1) Noise. If there is noise in the reverberant signal, then finding the IR is a little bit more difficult, but not impossible if you have access to the signals and can re-run them over and over if needed.
2) Non-linearity. If the is any non-linearity in the system (e.g. a speaker approaching saturation), then the problem becomes much more difficult. Also if the signal clips (either digitally or in any analog component) then the results will be unreliable.
3) Signal content. If the "excitation" has only limited frequency content, then it is either difficult or maybe even impossible to find the IR. The spectrum of the "dry" signal must excite all the characteristics of the response to find it accurately. So for example a single tone will tell you nothing about the reverb of the room.
4) Signal level. This is closely related to the noise. If the signal is not loud enough either in the excitation process, or the recording process, then the results will be off from the ideal. If the signals are too small then the quantization noise will limit performance.
5) Modeling length. If the model used in the adaptive filter is significantly less than the length of the impulse response then the result will be inaccurate. This also means that in the "real" world where things are Infinite Impulse Response (IIR) rather than Finite (FIR), then almost by definition the conventional FIR adaptive filter cannot completely solve the problem.
However, in the world of acoustics, there is a term used to define the important part of the reverberation time. Typically T60 is used. This means the time it takes until the impulse response has decayed by 60 dB from its peak. Other times Txx are also used sometimes. If we go with this definition, then an adaptive filter that spans the T60 time will usually be enough to find the IR in sufficient accuracy to call it good.
6) Sampling offset. This one is often forgotten. If the signal driving the system is driven by a D/A and thence a speaker (per your original diagram) and the microphone and thence its A/D are driven off exactly the same clock then things are good. However, if the clocks are different, then some clever adjustments have to be made to synchronize things.
Many years ago, I worked with a team building the receivers for Sirius Satellite radio. In that system, the recorded signal is encoded and packetized at a certain clock rate at the source. The car (or other) receiver, has no access to that clock, and has its own oscillator for the D/As in the receiver. If we merrily receive the data and play it out at the D/A clock rate, then we will either starve for samples or overflow our buffer over time. We had to create a clever synchronization scheme to "re-sample" the decoded received data to match the receiver's D/A clock. Very similar to the transmitter and receiver in a modem.
TMI Sorry for the long winded answer.
But basically yes, we can find the IR by having access to the "dry" signal and the reverberent signal.
Thank you for the helpful and detailed response (the same thanks goes to everyone else on this thread).
You've all helped me greatly in understanding Convolution Reverb, and for that I'm sincerely grateful.
Sorry for the hasty formulas below, I was too lazy to make real formulas.
Yes I agree to that. And could I retrieve the impulse response from a dry and a wet signal? Yes and No. Basically yes:
Wet = Dry convolved with Imp
If you transform this to the frequency domain you get
WET = DRY * IMP
IMP = WET / DRY
Transform back to time domain and you have Imp.... Kind of....
This only works if DRY is non-zero for all frequencies - otherwise you'd have divisions by zero. Also DRY should have a significant amount of signal energy for all frequencies - otherwise the division would be dominated by noise. There is a process called Wiener Filtering to accouint for signal-to-noise-ration in the upper formula.
You could also use adaptive filtering to identify the Impulse Response, but this only works well for short impulses - at least in the time domain.
The upper formula should work reasonably if you use white noise as dry signal.
Thank you for your response.
One last question for you, if I wanted to make it impossible for someone to derive an IR (100% accurately) from both a dry signal and reverberated signal (using a particular IR), can I assume that the only way to do that is if the dry signal is:
1. Zero for some frequencies (as you previously mentioned)
2. Long, time-wise (if using Adaptive Filtering - as you previously mentioned).
3. Has a small amount of signal energy for all frequencies (as you previously mentioned).
Furthermore, are you aware of any other way of preventing someone from being able to exactly replicate an IR if they had access to a dry signal and a reverberated signal?
Thank you very much for all your help.
Wow! You have just set up a "contest" to see if we can do it.
I love being a sleuth. :-)
1) To do what you said means that you can never send any of those frequencies (ever, remember that any impulsive sounds carry all frequencies, pops, clicks, drums other percussion). Removing individual frequencies will make the sound unpleasant for the listener, and you will have to remove a lot of the bandwidth.
2) I have run adaptive filters that are longer than most room reverberation that you would ever want. I'm working on a relatively simple problem that has over 500 msec of reverb at the moment. At 48000 sampling that is 24000 taps, and is doable if the signals are clean. Now, if you are trying to replicate a cathedral (2-5 sec reverb), then you may have made my life quite hard.
3) Small signals will just make the task harder and take longer to solve, but not impossible.
What "exactly" do you mean by "exactly replicate"?
If you are going to create some Intellectual Property (IP) that is an IR, then I believe that someone could come close enough, (if not exactly replicate) so that the average listener could not tell the difference.
I'm happy to be your guinea pig! :-)
I think that my answer is basically no. You will have a hard time stopping "everyone" from doing what you said.
A little bit of history with trying to stop people circumventing audio IP. Many years ago, when DAT was first standardized, the sampling rate was chosen as 48 kHz so that "no-one" would be able to digitally transfer from CD (44.1 kHz) to tape, thus preventing the preservation of the quality of digital without having to go back to analog. "Wrong!" That just gave some of us a nifty little task to solve. The ratios are nasty, but those of us who work in multi-rate domains just had to work some filters out with 160/147 resampling ratio. Done. We have our ways.
Thank you for the response.
As a clarification, when I say "exactly match an IR", what I mean is:
If some person (let's call them X) has an IR and a dry signal, and they reverberate their dry signal using their IR.
If person X then gave their dry signal and reverberated signal to some other person (let's call them Y), would Y be able to derive X's IR, so that Y could also reverberate X's dry noise (using Y's derived IR) such that X's and Y's reverberated signals completely cancel (when combining both X and Y's reverberated signal and inverting the phase of either one of the two reverberated signals)?
Thank you once again,
What you require, is certainly the Nth level of exactness.
However, if you just want to cancel enough so that the residual is pretty small and basically inperceptable. Then that is doable.
If however, you are going to use the IR as an encryption key (for instance, aha!) then we would need to be exact.
If that were the aim, then encryption guys could get close enough with the techniques I outlined, that they could then dither the lsbs of the filter coefficients to find the correct key I would guess, as could the person trying to find the exact residual errors if the problem were purely audio.
Hope I haven't spoiled something.
Once again I can't thank you enough for all your help, as you've truly helped me understand convolution reverb.
Thanks for your kind response.
I guess that you can tell that I have a passion for this stuff.
Best wishes for your research.
Would it be possible to contact you privately via my email address: firstname.lastname@example.org
...as I had a few more questions I was hoping I could ask you directly.
My appologies for the delay.
I have just sent an email from my private account.
I agree with dgshaw6.
If you have complete control over every dry signal that's handed over to person Y you might make Y's life harder by introducing a very sharp notch at a very low or higher frequency that would hardly be audible, but would make deconvolution a hard task...
Would it be possible to contact you privately via my email address: email@example.com
...as I had a few more questions I was hoping I could ask you directly.