Last week Ruth John asked me if she could use my standardized-audio-context package for a set of Web Audio demos she is currently preparing. Sure I said, but when she showed me some early drafts I quickly realized that she is using AudioNodes which the standardized-audio-context package is not yet supporting. But there is no particular reason for that. I just never used these AudioNodes in my own projects and no one actively asked for them.
I chose the StereoPannerNode as the first node to implement. It's a simple node with only one AudioParam called pan. If its value is -1 the signal will be panned to the left. If the value is 1 there will only be a signal on the right and if the value is zero nothing is going to happen. It's that easy. At least that's what I thought.
The browser support for the StereoPannerNode is actually pretty good. Only Safari, the web developer's best friend, has no native support. But the other browsers do closely follow the spec.
I was suprised to find out that the StereoPannerNode is meant to behave differently for mono and stereo input signals. And I was even more suprised that all browsers with a native implementation do actually honor that. The panning algorithm which should be used to compute the output is even part of the spec to avoid any ambiguity.
I'm not the first one who is trying to replicate the StereoPannerNode. As with many other Web Audio problems it's worth checking the GitHub account of mohayonao before reinventing the wheel. As expected there is also an implementation of the StereoPannerNode. It uses a cleverly designed graph of GainNodes and WaveShaperNodes in combination with a ChannelSplitterNode and a ChannelMergerNode to rebuild a StereoPannerNode. However it only supports the algorithm for mono signals. Nevertheless it is a good starting point and worth taking a closer look.
Since we are using the terms GainNode, ChannelSplitterNode, ChannelMergerNode and WaveShaperNode as if they are well known things from now on, I want to quickly explain what each of those nodes does. Feel free to skip ahead if you already know that.
The GainNode is probably the most basic AudioNode of all. It has only one AudioParam called gain. Whatever the input signal is will be mulitplied with the gain value.
The ChannelSplitterNode has only one input but can have multiple outputs. It splits the channels of the input across all its outputs.
The ChannelMergerNode is the counterpart of the ChannelSplitterNode. It can have multiple inputs but only one output. It maps all its inputs to the channels of the output.
The WaveShaperNode maps each sample of the input signal to a given output value. This output value is defined by the curve of the WaveShaperNode. This curve is nothing else than a big lookup table.
We are going to build our home grown StereoPannerNode like a black box. There should be no way to tell from the outside what is happening inside. We just need to expose an input, an output and the pan AudioParam.
In this particular case the ChannelSplitterNode and ChannelMergerNode are ideal candidates for the input and output nodes. They do have one input and one output respectively and allow us to work with the raw channels in between them. This will give us a basic architecture like this:
We also need to expose an AudioParam to control the pan value. A ConstantSourceNode looks very promising for that job. Unfortunately it needs to be started somehow and there is no functionality of the StereoPannerNode which we could abuse to start it.
A better approach is to use a WaveShaperNode that is connected to our input. We give it a DC curve (which is actually a line) of [ 1, 1 ]. This will guarantee that the output of that WaveShaperNode is always 1 no matter what input it has. We chain that WaveShaper with a GainNode and expose its gain AudioParam as the pan AudioParam. We end up with a simulated AudioParam that feeds its current value into the internal graph.
Next we need to apply the formula from the panning algorithm. The pan value needs to be transformed a bit in order to get the values which will then have to be multiplied with the left and right channel respectively.
// left channel sample * Math.cos(((pan + 1) / 2) * Math.PI / 2) // right channel sample * Math.sin(((pan + 1) / 2) * Math.PI / 2)
The algorithm requires the pan value to be mapped to a value between 0 and 1. This is what the ((pan + 1) / 2) part of the formula stands for. Later on that value gets fed into the cosine (or sine) function to produce the final value. This can be achieved by using a WaveShaper with a lot of precomputed values. A reduced version of the curve might look like this:
[ 1, 1, 1, 1, Math.cos(0 * Math.PI / 2), Math.cos(0.25 * Math.PI / 2), Math.cos(0.5 * Math.PI / 2), Math.cos(0.75 * Math.PI / 2), Math.cos(1 * Math.PI / 2) ]
The values of a WaveShaper's curve do always cover the range from -1 to 1. As you can see that would waste a lot of values in our curve. All values from -1 to 0 exclusively are never actually used because the range of possible input values starts at 0. A nice shortcut is therefore to not map the pan value from [ -1; 1 ] to [ 0; 1 ] in the first place and instead spread the curve across the whole range.
[ Math.cos(0 * Math.PI / 2), Math.cos(0.125 * Math.PI / 2), Math.cos(0.25 * Math.PI / 2), Math.cos(0.325 * Math.PI / 2), Math.cos(0.5 * Math.PI / 2), Math.cos(0.625 * Math.PI / 2), Math.cos(0.75 * Math.PI / 2), Math.cos(0.825 * Math.PI / 2), Math.cos(1 * Math.PI / 2) ]
This little trick allows us to use the full range of the WaveShaperNode's curve and also saves us from doing some unnecessary value mapping. However we still need two of those WaveShaperNodes as the computation for the right channel is using the sine instead of the cosine function. Those two WaveShaperNodes are abbreviated as WSN in the following diagram.
We also create a GainNode for each channel. The WaveShaperNodes are not connected to the GainNodes directly. They are controlling their gain AudioParam. In other words they multiply their value with the input signal.
And with that we build a fully working clone of the StereoPannerNode. I think it's a nice example which shows what is already possible without using an AudioWorklet. But at the same time it also stresses the need for the AudioWorklet as there is no way to know if an input signal is mono or stereo without it. The only solution I can think of is requiring the channelCountMode to be explicit for now. This ensures that the channelCount is predictable and can't be changed dynamically by the input signal.
The algorithm for a stereo signal is a bit more complicated but can be build with the same technique. For the curious, the implementation of the stereo algorithm can be looked up in the source code.
It's worth noting that the accuracy of this method depends on the size of the curves used for the WaveShaperNodes.
An interesting fun fact is that the algorithm for mono signals will actually modify the signal if the pan value is zero but that is absolutely intenional.