top of page
Dynamic Geometric-Based Reverb
-
The study of the propagation on sound with rays is know as geometric acoustics, or ray acoustics
-
Since light and sound can both be represented as waves or rays, we can use similar concepts to apply light to audio​​
-
Applying ray tracing to audio allows us to replicate the effect known as reverb
-
This allows for more realistic sounding virtual environments
Reverberation
-
Also called reverb
-
A collection of sound waves
-
Based on environmental factors
-
Volume and geometry of the room​
-
Material absorption factors
-
How much energy a material absorbs​
-
-
Scattering factors​
-
Different materials scatter energy differently​
-
-
Below is an example of reverb:
Piano scale no reverb
00:00 / 00:05
Piano scale with reverb
00:00 / 00:05
Common Techniques
-
Algorithmic
-
Can sound realistic
-
Customizable parameters
-
Delay, room size, density, frequency filtering, etc.​
-
-
Does not represent a real place​
-
-
Pre-Recorded Impulse Response​ (IR)
-
Recorded in a real place
-
Captures scattering, material absorptions, and the size of the room​
-
-
Normally produced by a clap, popping a balloon, a gunshot, etc.​
-
Requires equipment to be taken to a location
-
Uses convolution to apply the IR to the sound
-
Impulse Response
-
Reaction of a dynamic system to an external change
-
This is what is responsible for how much reverb is added to a signal
-
Made of 3 parts
-
Direct path​
-
Direct unblocked path from the source to the listener​
-
-
Early or specular reflections​
-
When the sound bounces perfectly off an object​
-
-
Late or diffuse reflections​
-
These reflections are spawned when sound hits an object and is scattered​
-
Diffuse reflections are not calculated in this model​​
-
For every bounce, we would produce a number of rays that would increase the number of rays exponentially​
-
-
-
Direct Path
Early Reflections
Late Reflections
Setup of Project
-
In order to get into the implementation of the impulse response generation and convolution, we need to set up a few things
-
Goal of the thesis artifact
-
Play 3D audio and apply audio effects​
-
Generate an impulse response
-
Ray tracing​
-
Compute shaders for parallelization of generating rays
-
-
Apply an impulse response to an audio signal​
-
Fast Fourier Transform and its inverse​
-
Convolution
-
-
Goal:​
-
Audio with reverb that matches the virtual environment​
-
-
Audio Engine
-
This project was done in my personal C++ game engine
-
XAudio2 API
-
Made for game developers​
-
Already using DirectX11 rendering engine, so it seemed like the most reasonable choice​​
-
-
X3DAudio​
-
Extension to XAudio2​
-
Allows the use of listeners and emitters
-
-
XAPO ​
-
These are Cross-Platform Audio Processing Objects​
-
This allows us to define our own audio effects
-
Ray Tracing
-
We use rays to propagate the sound energy across a space
-
Ray versus voxel grid
-
The world geometry is made of 1 x 1 x 1 blocks in a voxel grid pattern​
-
There is an algorithm that computes a ray vs voxel grid really fast
-
Utilize that the ray will cross the intersections at set intervals
-
​
Below shows the concept on a 2D grid, but it can be applied in 3D as well.
-
When the ray hits on of the walls, floor, or ceiling it reflects
-
We also keep track of a few variables​
-
Increment the number of bounces​
-
Compute the reflected energy
-
Add the ray length to the total path length
-
-
-
Ray versus sphere
-
The listening and emitting points are infinitely small​
-
Almost impossible to hit with a ray​
-
-
Spheres are used instead to give volume and increase the chance of the ray hitting​
-
-
When the ray hits the sphere​
-
We record the number of bounces, energy remaining and ray length plus total path length​
-
Compute Shaders
-
We can utilize the GPU for the extra computation power that is needed
-
For a more accurate simulation, the more rays traced, the better
-
The current simulation uses over 16 million rays that bounce 30 times​
-
The reason for so many is that most of them likely won't hit the target
-
-
It's part of the programmable shader pipeline in DirectX11​
-
Because I have a DirectX11 renderer, it was a matter of just hooking it up to the system​
-
-
Need to precompile the shader​
-
The shader is long and can take several minutes to compile​
-
This allows us to start the program significantly faster
-
Fast Fourier Transform
-
Optimized algorithm to compute the Discrete Fourier Transform
-
Fast Fourier Transform (FFT) is used to transform a signal from the time domain to the frequency domain
-
Its inverse IFFT, is use to transform a signal in the frequency domain into the time domain
-
We use the Cooley-Tukey method
-
Works best when the signal has a length with a power of 2​
-
Impulse Response Generation
-
Once all the rays are collected, we still need to put it in a useable form
-
An impulse response is just the collection of those rays
-
It still needs to be sorted based on the time it would arrive​
-
-
The length of the array divided by the speed of sound tells us how long it took the ray to bounce back the sound​
-
Multiplying the result by the amount of energy remaining in the system allows us to mimic different types of environments​​
-
The result of this describes an impulse and a collection of impulses summed together gives us the Impulse Response we were looking for
Programmatically Generated Impulse Response
Pre-generated Impulse Response from a Transit Center
Convolution
-
Convolution is a mathematical operation that uses 2 functions to create a new 3rd function
-
In DSP, we use it to map the Impulse Response to an audio signal
-
The type of convolution that is done is called Discrete Convolution
-
-
Reasons to use​
-
Constantly getting new input, infinite length​
-
A new frame every 10 milliseconds​
-
-
Impulse Response is very long, but it is finite​
-
Because of the length differences, we can decompose the signals into different blocks
-
Uniform Partitioning
-
Treat each frame as an equal, uniform, block​
-
Break up the Impulse Response into these blocks
-
-
This is where FFT comes in​
-
Convolution in the time domain equals multiplication in the frequency domain​
-
This is from the convolution theorem​
-
-
Much faster than computing convolution in the time domain​
-
-
Considerations for Convolution​
-
It's expensive and you can't slow down the audio thread​
-
-
-
There are multiple algorithms to be considered that have pros and cons​​
-
Overlap save and overlap add algorithms​
-
Very similar approaches​
-
Both require 1 + N FFTs to be performed depending on the number of blocks to be considered in the new signal
-
About 1.9 ms per FFT​
-
-
Not really feasible for multiple audio sources
-
Even with threads, it's not guaranteed to be finished on time
-
Works best for offline computing of the signal
-
-
The algorithm opted for was the Frequency Domain Delay Line (FDL) algorithm​
-
Only computes 2 FFTs per frame​
-
Takes about 1.42 ms - 3.8 ms​​​
-
-
Allows for more frames of previous audio data to be used​
-
Was able to use up to 16 frames of data​
-
-
Main features​​​
-
List of blocks from previous frames​
-
Doesn't have to keep track of extra data​​
-
Has a buffer for adding and storing each FDL block and Impulse Response block filter
-
-
Drawbacks​
-
Needs to be threaded and operate at least 1 frame behind if we want multiple audio sources at once​
-
FFT computation time​
-
-
Still has a limit on the number of audio sources, but now more are allowed​​​​​
-
-
-
Number of Bounces and Rays
-
There are only 30 bounces because, at that point, the energy has dropped below a defined energy threshold
-
Energy <20%​
-
-
These impulse responses were recorded in the same place and have line of sight with the audio source​
-
The biggest differences that can be noticed between them is the difference in the magnitudes
-
These magnitudes are the energy recorded​
-
​
262,144 Rays
1,048,576 Rays
4,194,304 Rays
16,777,216 Rays
Drawbacks
-
Computationally intensive
-
Impulse response generation cannot be done in real-time
-
There's a delay that freezes the simulation when you generate ad new impulse response​
-
The time for impulse response generation increases as you add more audio sources
-
-
Digital signal processing for convolution each takes a thread to run their computations​
Applications
-
Saving off the impulse response
-
Using trigger volumes, a new set of impulses can be loaded and applied dynamically in real-time​
-
Similar to light probes that are used to bake light into a scene, use this same technique with the impulse responses and fade between them
-
-
Quickly see how different sized spaces and materials affect sounds​
-
More accurate scenarios
-
Fire fights, horror, sneaking, etc​
-
Future Work
-
Implement ray versus convex objects
-
Have more interesting geometry​
-
Ability to see how the placement of objects affects the impulse response
-
-
Consider all frequency bands​
-
Currently each band is treated the exact same​
-
Each band absorbs and scatters different amounts of energy
-
-
Consider other elements outside of reverberation such as occlusion and obstruction​
-
Handle diffuse reflections in a way that does not increase the number of rays exponentially
-
Optimizations to the HLSL shader code and the convolution algorithm
bottom of page