top of page
Dynamic Geometric-Based Reverb
GeometricAcousticCover.png
Dynamic Geometric-Based Reverb
  • The study of the propagation on sound with rays is know as geometric acoustics, or ray acoustics 
  • Since light and sound can both be represented as waves or rays, we can use similar concepts to apply light to audio​​
  • Applying ray tracing to audio allows us to replicate the effect known as reverb
  • This allows for more realistic sounding virtual environments
Reverberation
  • Also called reverb
  • A collection of sound waves
  • Based on environmental factors
    • Volume and geometry of the room​
    • Material absorption factors
      • How much energy a material absorbs​
    • Scattering factors​
      • Different materials scatter energy differently​
 
Below is an example of reverb:
Piano scale no reverb
00:00 / 00:05
Piano scale with reverb
00:00 / 00:05
Common Techniques
  • Algorithmic
    • Can sound realistic
    • Customizable parameters
      • Delay, room size, density, frequency filtering, etc.​
    • Does not represent a real place​
  • Pre-Recorded Impulse Response​ (IR)
    • Recorded in a real place
      • Captures scattering, material absorptions, and the size of the room​
    • Normally produced by a clap, popping a balloon, a gunshot, etc.​
    • Requires equipment to be taken to a location
    • Uses convolution to apply the IR to the sound
Impulse Response
  • Reaction of a dynamic system to an external change
  • This is what is responsible for how much reverb is added to a signal
  • Made of 3 parts
    • Direct path​
      • Direct unblocked path from the source to the listener​
    • Early or specular reflections​
      • When the sound bounces perfectly off an object​
    • Late or diffuse reflections​
      • These reflections are spawned when sound hits an object and is scattered​
      • Diffuse reflections are not calculated in this model​​
        • For every bounce, we would produce a number of rays that would increase the number of rays exponentially​

Direct  Path

Early Reflections

Late Reflections

Setup of Project

  • In order to get into the implementation of the impulse response generation and convolution, we need to set up a few things
  • Goal of the thesis artifact
    • Play 3D audio and apply audio effects​
    • Generate an impulse response
      • Ray tracing​
      • Compute shaders for parallelization of generating rays
    • Apply an impulse response to an audio signal​
      • Fast Fourier Transform and its inverse​
      • Convolution
    • Goal:​
      • Audio with reverb that matches the virtual environment​
Audio Engine
  • This project was  done in my personal C++ game engine
  • XAudio2 API
    • Made for game developers​
    • Already using DirectX11 rendering engine, so it seemed like the most reasonable choice​​
  • X3DAudio​
    • Extension to XAudio2​
    • Allows the use of listeners and emitters
  • XAPO ​
    • These are Cross-Platform Audio Processing Objects​
    • This allows us to define our own audio effects
Ray Tracing
  • We use rays to propagate the sound energy across a space
  • Ray versus voxel grid
    • The world geometry is made of 1 x 1 x 1 blocks in a voxel grid pattern​
    • There is an algorithm that computes a ray vs voxel grid really fast
    • Utilize that the ray will cross the intersections at set intervals
​
Below shows the concept on a 2D grid, but it can be applied in 3D as well.
  • When the ray hits on of the walls, floor, or ceiling it reflects
    • We also keep track of a few variables​
      • Increment the number of bounces​
      • Compute the reflected energy
      • Add the ray length to the total path length
  • Ray versus sphere
    • The listening and emitting points are infinitely small​
      • Almost impossible to hit with a ray​
    • Spheres are used instead to give volume and increase the chance of the ray hitting​
  • When the ray hits the sphere​
    • We record the number of bounces, energy remaining and ray length plus total path length​
Compute Shaders
  • We can utilize the GPU for the extra computation power that is needed
  • For a more accurate simulation, the more rays traced, the better
    • The current simulation uses over 16 million rays that bounce 30 times​
    • The reason for so many is that most of them likely won't hit the target
  • It's part of the programmable shader pipeline in DirectX11​
    • Because I have a DirectX11 renderer, it was a matter of just hooking it up to the system​
  • Need to precompile the shader​
    • The shader is long and can take several minutes to compile​
    • This allows us to start the program significantly faster
Fast Fourier Transform
  • Optimized algorithm to compute the Discrete Fourier Transform
  • Fast Fourier Transform (FFT) is used to transform a signal from the time domain to the frequency domain
  • Its inverse IFFT, is use to transform a signal in the frequency domain into the time domain
  • We use the Cooley-Tukey method
    • Works best when the signal has a length with a power of 2​
Impulse Response Generation
  • Once all the rays are collected, we still need to put it in a useable form
  • An impulse response is just the collection of those rays
    • It still needs to be sorted based on the time it would arrive​
  • The length of the array divided by the speed of sound tells us how long it took the ray to bounce back the sound​
  • Multiplying the result by the amount of energy remaining in the system allows us to mimic different types of environments​​
  • The result of this describes an impulse and a collection of impulses summed together gives us the Impulse Response we were looking for
Programmatically Generated Impulse Response
Pre-generated Impulse Response from a Transit Center
Convolution
  • Convolution is a mathematical operation that uses 2 functions to create a new 3rd function
  • In DSP, we use it to map the Impulse Response to an audio signal
    • The type of convolution that is done is called Discrete Convolution
  • Reasons to use​
    • Constantly getting new input, infinite length​
      • A new frame every 10 milliseconds​
    • Impulse Response is very long, but it is finite​
    • Because of the length differences, we can decompose the signals into different blocks
    • Uniform Partitioning
      • Treat each frame as an equal, uniform, block​
      • Break up the Impulse Response into these blocks
    • This is where FFT comes in​
      • Convolution in the time domain equals multiplication in the frequency domain​
        • This is from the convolution theorem​
      • Much faster than computing convolution in the time domain​
    • Considerations for Convolution​
      • It's expensive and you can't slow down the audio thread​
  • There are multiple algorithms to be considered that have pros and cons​​
    • Overlap save and overlap add algorithms​
      • Very similar approaches​
      • Both require 1 + N FFTs to be performed depending on the number of blocks to be considered in the new signal
        • About 1.9 ms per FFT​
      • Not really feasible for multiple audio sources
      • Even with threads, it's not guaranteed to be finished on time
      • Works best for offline computing of the signal
    • The algorithm opted for was the Frequency Domain Delay Line (FDL)  algorithm​
      • Only computes 2 FFTs per frame​
        • Takes about 1.42 ms - 3.8 ms​​​
      • Allows for more frames of previous audio data to be used​
        • Was able to use up to 16 frames of data​
      • Main features​​​
        • List of blocks from previous frames​
        • Doesn't have to keep track of extra data​​
        • Has a buffer for adding and storing each FDL block and Impulse Response block filter
      • Drawbacks​
        • Needs to be threaded and operate at least 1 frame behind if we want multiple audio sources at once​
          • FFT computation time​
        • Still has a limit on the number of audio sources, but now more are allowed​​​​​
Number of Bounces and Rays
  • There are only 30 bounces because, at that point, the energy has dropped below a defined energy threshold
    • Energy <20%​
  • These impulse responses were recorded in the same place and have line of sight with the audio source​
  • The biggest differences that can be noticed between them is the difference in the magnitudes
    • These magnitudes are the energy recorded​
​
262,144 Rays
1,048,576 Rays
4,194,304 Rays
16,777,216 Rays
Drawbacks
  • Computationally intensive
  • Impulse response generation cannot be done in real-time
    • There's a delay that freezes the simulation when you generate ad new impulse response​
    • The time for impulse response generation increases as you add more audio sources
  • Digital signal processing for convolution each takes a thread to run their computations​
Applications
  • Saving off the impulse response
    • Using trigger volumes, a new set of impulses can be loaded and applied dynamically in real-time​
    • Similar to light probes that are used to bake light into a scene, use this same technique with the impulse responses and fade between them
  • Quickly see how different sized spaces and materials affect sounds​
  • More accurate scenarios
    • Fire fights, horror, sneaking, etc​
Future Work
  • Implement ray versus convex objects
    • Have more interesting geometry​
    • Ability to see how the placement of objects affects the impulse response
  • Consider all frequency bands​
    • Currently each band is treated the exact same​
    • Each band absorbs and scatters different amounts of energy
  • Consider other elements outside of reverberation such as occlusion and obstruction​
  • Handle diffuse reflections in a way that does not increase the number of rays exponentially
  • Optimizations to the HLSL shader code and the convolution algorithm
bottom of page