What else can we do with the Web Audio API?

Of course the Web Audio API is meant for synthesizing and processing audio data. It is tailored for that use case. But at least in our digital world audio data is just a series of numbers, which are typically somewhere between +1 and -1. So why can't we use the Web Audio API for general computations?

Almost a year ago I had the pleasure to give a talk at the Web Audio Conference in Atlanta. The conference featured a lot of great talks, which I really appreciated as an attendee. However, as a speaker it was tough to reduce my own talk until it was short enough to fit into the schedule. I had the feeling that I had to rush through my slides. Since then I planned to write down my findings in a more detailed way, but I never got around to it. Luckily I was asked to repeat my talk at our local Web Audio Meetup here in Berlin a few weeks ago. I did of course not want to present an one year old talk, therefore I had to revisit all my material to make sure it's still up to date. So I took the opportunity to also write a summary along the way. You are currently reading the result. Just in case you prefer clicking through slides, here are the links to the original slide deck and the update version.

Why should we abuse the Web Audio API?

There are a couple of reasons why I think the Web Audio API can be used for general purpose computations. The most obvious argument is that we can reuse an existing implementation. Even better, we don't have to maintain that implementation. It's done for us by the browser vendors. We have to ship less code since it is already part of each user's browser and we hopefully encounter less bugs since the implementation we are using is the same which thousands of other users rely on as well.

Another positive aspect of using the Web Audio API is that we get asynchronous and non blocking execution for free. Due to its architecture the Web Audio API runs in its own thread and therefore does not interfere with animations or other important tasks that run in the main window's thread. You could of course achieve the same result by using Web Workers, but then you have to deal with the complexity of asynchronous programming yourself.

As already mentioned, the Web Audio API is a part of the browser. That means it is written in something like C. Therefore it could potentially be faster than any clever implementation anyone can write in JavaScript or Web Assembly. I did some performance tests to check my assumptions as you will see later.

What kind of computations can be done with the Web Audio API?

In my opinion it does not really make sense to use the Web Audio API for every computation you have to do. Of course it only makes sense if it gets relatively complex. It's also worth mentioning that you need signal like data. Computing single values does not really make sense. But if you have a series of values, which need to be computed identically, you might benefit from using the Web Audio API.

Although the Web Audio API may only be used to tackle complex problems, the building blocks are super simple, as you will see.

Boilerplate

Before the actual computations can be done a bit of boilerplate is inevitable. At first an OfflineAudioContext and an AudioBuffer need to be created. The AudioBuffer gets the actual values copied onto its channel(s). At last we also need an AudioBufferSourceNode which is used to "play" the AudioBuffer. If all that is done the AudioBufferSourceNode can be scheduled to start "playing" at the start of the OfflineAudioContext. We don't connect it yet because that depends on the type of computation we want to do.

const values = new Float32Array([/* ... */]);
const audioContext = new OfflineAudioContext(
    1, values.length, anyValidSampleRate
);
const valuesBuffer = new AudioBuffer({
    length: values.length, sampleRate: audioContext.sampleRate
});
valuesBuffer.copyToChannel(values, 0);
const valuesBufferSource = new AudioBufferSourceNode(audioContext, {
    buffer
});
valuesBufferSource.start(0);
// ...

After the computations are set up another bit of boilerplate is necessary. The OfflineAudioContext needs to be started and finally the channel(s) from the renderedBuffer can be used to retrieve the actual results.

// ...
audioContext
    .startRendering()
    .then((renderedBuffer) => {
        const results = new Float32Array(values.length);

        renderedBuffer.copyFromChannel(results, 0);

        return results;
    });

Addition

Adding a constant value or another signal to a given signal is the simplest operation possible. Just create an AudioBufferSourceNode for a signal or a ConstantSourceNode for a constant value. Then all which needs to be done is connecting the nodes to the destination of the used OfflineAudioContext. An example for the addition of a constant value looks like this:

//...
const constantSource = new ConstantSourceNode(audioContext, {
    offset: 0.3
});
constantSource.start(0);

valuesBufferSource.connect(audioContext.destination);
constantSource.connect(audioContext.destination);
//...

Here is a link to the equivalent code in JavaScript. Computing five million values with the Web Audio API or plain JavaScript had the following performance characteristics on my relatively old MacBook:

Chrome (59)Firefox (54)
JavaScript one addition802ms107ms
JavaScript ten additions7778ms1159ms
WAA one addition162ms173ms
WAA ten additions779ms915ms

Here is a link to the actual perfomance test code.

Subtraction

Subtracting a constant value or another signal from a given signal looks almost identical. The only difference is that you have to insert a GainNode with a gain of minus one to negate the value(s).

//...
const constantSource = new ConstantSourceNode(audioContext, {
    offset: 0.7
});
constantSource.start(0);
const gain = new GainNode(audioContext, {
    gain: -1
});

valuesBufferSource.connect(audioContext.destination);
constantSource
    .connect(gain)
    .connect(audioContext.destination);
//...

Here is a link to the correspondig JavaScript code.

Multiplication

To multiply a constant value with a given signal the only thing needed is a GainNode with the desired factor as gain value. Luckily it is possible to feed the output of a BufferSourceNode into an AudioParam and therefore it is possible to create a BufferSourceNode and connect it to the gain property of the GainNode to multiply two signals with each other.

//...
const gain = new GainNode(audioContext, {
    gain: 0.4
});

valuesBufferSource
    .connect(gain)
    .connect(audioContext.destination);
//...

Once again, here is a link to the JavaScript Version.

Division

As with addition and subtraction, division looks almost like multiplication.

//...
const gain = new GainNode(audioContext, {
    gain: 1 / 0.2
});

valuesBufferSource
    .connect(gain)
    .connect(audioContext.destination);
//...

I also wrote a JavaScript Version of the divison and compared both implementations in a perfomance test. Here are the results:

Chrome (59)Firefox (54)
JavaScript one division781ms105ms
JavaScript ten divisions7905ms1219ms
WAA one division97ms140ms
WAA ten divisions143ms173ms

Downsampling

Downsampling can be done easily. I would argue that it is a classic audio related task. Although it can of course also be applied to non audio data.

Sadly the actual algorithm for downsampling is not specified. Every browser came up with a different solution. Chromium based browsers for example use something called SincResampler and Firefox is using the resampler provided by Speex.

If you don't mind the minor differences you can downsample a signal like this:

const downsampleContext = new OfflineAudioContext(
    audioBuffer.numberOfChannels,
    audioBuffer.duration * 11025,
    11025
);
const bufferSource = new AudioBufferSourceNode(downmixContext, {
    buffer: audioBuffer
});
bufferSource.start(0);
bufferSource.connect(downsampleContext.destination);

downsampleContext.startRendering();
// will resolve with the downsampled audioBuffer

Down-mixing

Down-mixing is the process of converting a signal with more than one channel to one with less channels. This is clearly an audio task. Converting stereo to mono would be a popular example. It can also be easily done with the Web Audio API and is specified in a detailed way. In theory down-mixing can also be used to add two ore more signals. Let's say you create an AudioBuffer with two signals as its channels you can convert it to mono and apply a GainNode with a gain of 2 to get the added signal.

Here is a little example which shows how to convert a signal (already wrapped in an AudioBuffer) to mono:

const downmixContext = new OfflineAudioContext(
    1,
    audioBuffer.length,
    audioBuffer.sampleRate
);
const bufferSource = new AudioBufferSourceNode(downmixContext, {
    buffer: audioBuffer
});
bufferSource.start(0);
bufferSource.connect(downmixContext.destination);

downmixContext.startRendering();
// will resolve with the downmixed audioBuffer

Applying an FFT

Typically an FFT is explained by showing an equalizer-like visualization of an audio signal. But you can apply an FFT to all sorts of data. It is by far not limited to audio related usage. There is an FFT implementation in every implementation of the Web Audio API, however you can't access it directly. It can only be accessed indirectly via the AnalyserNode and the AnalyserNode doesn't really give you control on what part of the signal the FFT gets applied to. It's meant to be used with live data.

If you need to apply an FFT there is no other option but relying on a JavaScript implementation. But there is still room for optimization. It's worth to check out the WebAudio FFT Performance Test before picking an FFT library, because there are huge performance differences.

Applying an IIR Filter

The IIRFilterNode is fairly new and has not yet landed in Safari. But it is worth using it. The native implementation is 10 times faster than my trivial JavaScript implementation which is basically a simplified version of the source code used by Chromium. There are a lot of applications for an IIR filter, but if you are in doubt for what to use it, you may want to check out the IIR Filter Workshop.

//...
const iirFilter = new IIRFilterNode(audioContext, {
    feedback: new Float32Array([/* ... */]),
    feedforward: new Float32Array([/* ... */])
});

valuesBufferSource
    .connect(iirFilter)
    .connect(audioContext.destination);
//...

Again, I compared my JavaScript implementation with the Web Audio API by running this performance test to prove the huge performance benefits.

Chrome (59)Firefox (54)
JavaScript2289ms490ms
WAA189ms283ms

Not so good parts

If you read until here, you are probably convinced that it's worth to give the Web Audio API a try when you have to do some computations the next time. However as always there are also some downsides which I don't want to keep as a secret.

The main downside is that you have to get your data into the Web Audio API before it can be used and once everything is done you have to unwrap the data again in order to receive the result. This is necessary whenever you use the Web Audio API, however it is only needed upfront and as a last step. If you manage to chain multiple computations after each other you don't have to do it between each computation as long as you don't leave the Web Audio API.

At the Web Audio Meetup someone asked if it would be possible to hide the boilerplate with an easy to use framework. I wasn't really prepared for that question, but after thinking about it for quite some time, I think it would be very difficult to account for the fact that the computation could be the first, the last or some computation in between where data marshaling would not be needed.

Could we get even better performance?

There is always something which could be tweaked and there is a good chance that other implementations could be even faster. Using asm.js or Web Assembly is something which could be explored if the performance still needs to be improved. A completely different approach could be to use WebGL.