A brief foreword
This is the first in a series of articles I intend to write on the experience of
building a userspace audio stack for Asahi Linux. We're going to discuss everything
from the horrific state of the Linux userspace audio stack, to how audio engineers
get you hearing sounds that are not being physically produced by a speaker.
Working on this over the last two years has been an absolute blast, helped in
no small part by the Asahi community being made up of some of the smartest and most
welcoming people one could hope to work with. As a first
experience with semi-serious work on a FOSS project, you really could not hope
for anything better.
Reverse Engineering a Speaker - Part 1.
Finding something to do
Apple's MacBook Pro line has always had a reputation for delivering built-in audio
quality that is far superior to almost every other similar device. The redesigned
Apple Silicon 14" and 16" models take this further, in my opinion setting a whole
new standard for what consumers should seek, nay, demand from manufacturers.
When I first got a 14" MacBook Pro with the intention of contributing to Asahi Linux,
I ran into a bit of an issue. At the time, I had very little experience hacking on the
kernel, very little practical C experience in general, was studying medicine, and had
no time to learn about either. Well shit. That is, until, Martin "povik" PoviĊĦer wrote a
functioning ALSA driver for the Apple Silicon platform.
An history of microspeakers
Small speakers have never been good. The market
segment that they serve has traditionally not been discerning connoisseurs. They
are therefore manufactured with cheap, crappy materials, and to a very poor mechanical standard. For a long
time, this was simply the accepted state of affairs - small speakers suck, there's nothing
we can do, it's just how it is. Dark times indeed.
Traditionally, because of how delicate and pathetic these "microspeakers" are, embedded devices
which utilise them sometimes had some sort of firmware which handled driving them. This "handling" was
almost always just some basic overexcursion and overcurrent protection. The former prevents
the speaker from ripping itself apart, and the latter prevents the speaker from melting
itself. Doing anything else was just too computationally expensive. Famously, certain Chromebooks
omitted this, and it was possible to destroy them from userspace.
Shifting gears to the other end of the spectrum for a second, we need to talk about performance venues.
Most modern large venue speaker arrays are controlled by a drive rack, which is a rack containing
some sort of DSP unit feeding into beefy, expensive Class D amps.
It will take an input, usually stereo, and apply all sorts of room correction effects and EQ
to it so that the revolving door of engineers don't need to worry about the nuances of the room.
This concept extends to modern mixing consoles, too, which are usually entirely digital. Some
are even just control surfaces and processors for networked analogue preamp racks,
which live closer to the stage.
This does kinda sound like something we might benefit from on small devices though, right?
We know that small speakers suck, so what if we could apply the drive rack concept to
small devices? Unfortunately, the DSPs in professional audio gear are usually
thousands of dollars per unit, proprietary implementations of proprietary
ISAs (which are usually VLIW-based), and generally just awful awful things to try
and program. Oh, and they're as big as and as power hungry as a consumer
GPU too. Imagine shoving AMD's Navi 31 into a Bluetooth speaker. Yeah, no thanks.
We don't really need a giant pro-grade DSP though. Most pro DSPs can handle 32+ channels,
each with 2 or more digital effects running. In most devices with sucky small speakers, though, we
usually only have two channels of audio. As it turns out, floating point math is now
really fast and really cheap. Even jellybean microcontroller-tier cores are fast enough
to do basic DSP on a 16-bit 44.1 kHz stero signal. See where we're going with this?
Smoke and mirrors
This would be a very quick and boring story if it ended with, "And then we turned on the
ASoC driver and the speakers worked and we all rejoiced." Turns out that
macOS handles everything in CoreAudio, at the border between the kernel and
userspace. Each of the 6 microspeakers is exposed to userspace as an individual device,
and has its own codec chip. And they sound absolutely awful in Linux, where we don't
have CoreAudio to paper over their shortcomings.
Now you may have noticed that the boot chime sounds good, so how does that work if all the
processing is done by macOS? In a testament
to Apple's ridiculous attention to detail, iBoot has a unique boot chime for each
machine, with all the relevant multichannel DSP pre-applied to the signal. All the firmware does is
power up the speakers, streams these samples, then shuts them off again until
XNU brings them back up.
Challenge accepted!
After the initial disappointment of these revelations wore off, we began brainstorming
possible solutions. Audio support is obviously a must, but was absolutely not shippable in
its default state. An early idea floated by marcan was to sniff around the macOS filesystem
and/or memory for the DSP data needed and copy it. This initially seemed promising, however
Apple do something weird. Their DSP is not in the standard format of an impulse response,
they actually leverage the full suite of CoreAudio plugins, the parameters for which are
stored as base64-encoded binary data. It may have been possible to reverse engineer this format
in some capacity and extract useful information out of it, but even so Apple have quite
an eclectic mix of plugins that don't map particularly well to anything readily available in
the Linux world.
But that got me thinking. None of us particularly like the macOS
sound profile. Sure, it's impressive and a damn sight better than any other laptop, but
it's... too flashy. Apple do all these little tricks for spatial audio, stereo imaging,
transfer functions for compensating for where your head is, it's endless. It's a device
called Pro that's been engineered to sound like a consumer device. This is not what
pros want. Pros want a natural, refined stereo speaker array. A reference system.
So what if we could turn this situation into an opportunity? What if, instead of just
copying macOS, we could do something better than macOS? We have a blank canvas
on which we can paint anything we want. With the paints and brushes being the Linux
audio stack, naturally no one was really that keen to jump into this. At the time, however,
I was on a break from uni and looking for something to do.
How Bad Could It Possibly Be?