Reverse Engineering a Speaker - Part 1 ? X
Home GitHub Contact

A brief foreword

This is the first in a series of articles I intend to write on the experience of building a userspace audio stack for Asahi Linux. We're going to discuss everything from the horrific state of the Linux userspace audio stack, to how audio engineers get you hearing sounds that are not being physically produced by a speaker.

Working on this over the last two years has been an absolute blast, helped in no small part by the Asahi community being made up of some of the smartest and most welcoming people one could hope to work with. As a first experience with semi-serious work on a FOSS project, you really could not hope for anything better.

Reverse Engineering a Speaker - Part 1.

Finding something to do

Apple's MacBook Pro line has always had a reputation for delivering built-in audio quality that is far superior to almost every other similar device. The redesigned Apple Silicon 14" and 16" models take this further, in my opinion setting a whole new standard for what consumers should seek, nay, demand from manufacturers.

When I first got a 14" MacBook Pro with the intention of contributing to Asahi Linux, I ran into a bit of an issue. At the time, I had very little experience hacking on the kernel, very little practical C experience in general, was studying medicine, and had no time to learn about either. Well shit. That is, until, Martin "povik" PoviĊĦer wrote a functioning ALSA driver for the Apple Silicon platform.

An history of microspeakers

Small speakers have never been good. The market segment that they serve has traditionally not been discerning connoisseurs. They are therefore manufactured with cheap, crappy materials, and to a very poor mechanical standard. For a long time, this was simply the accepted state of affairs - small speakers suck, there's nothing we can do, it's just how it is. Dark times indeed.

Traditionally, because of how delicate and pathetic these "microspeakers" are, embedded devices which utilise them sometimes had some sort of firmware which handled driving them. This "handling" was almost always just some basic overexcursion and overcurrent protection. The former prevents the speaker from ripping itself apart, and the latter prevents the speaker from melting itself. Doing anything else was just too computationally expensive. Famously, certain Chromebooks omitted this, and it was possible to destroy them from userspace.

Shifting gears to the other end of the spectrum for a second, we need to talk about performance venues. Most modern large venue speaker arrays are controlled by a drive rack, which is a rack containing some sort of DSP unit feeding into beefy, expensive Class D amps. It will take an input, usually stereo, and apply all sorts of room correction effects and EQ to it so that the revolving door of engineers don't need to worry about the nuances of the room. This concept extends to modern mixing consoles, too, which are usually entirely digital. Some are even just control surfaces and processors for networked analogue preamp racks, which live closer to the stage.

This does kinda sound like something we might benefit from on small devices though, right? We know that small speakers suck, so what if we could apply the drive rack concept to small devices? Unfortunately, the DSPs in professional audio gear are usually thousands of dollars per unit, proprietary implementations of proprietary ISAs (which are usually VLIW-based), and generally just awful awful things to try and program. Oh, and they're as big as and as power hungry as a consumer GPU too. Imagine shoving AMD's Navi 31 into a Bluetooth speaker. Yeah, no thanks.

We don't really need a giant pro-grade DSP though. Most pro DSPs can handle 32+ channels, each with 2 or more digital effects running. In most devices with sucky small speakers, though, we usually only have two channels of audio. As it turns out, floating point math is now really fast and really cheap. Even jellybean microcontroller-tier cores are fast enough to do basic DSP on a 16-bit 44.1 kHz stero signal. See where we're going with this?

Smoke and mirrors

This would be a very quick and boring story if it ended with, "And then we turned on the ASoC driver and the speakers worked and we all rejoiced." Turns out that macOS handles everything in CoreAudio, at the border between the kernel and userspace. Each of the 6 microspeakers is exposed to userspace as an individual device, and has its own codec chip. And they sound absolutely awful in Linux, where we don't have CoreAudio to paper over their shortcomings.

Now you may have noticed that the boot chime sounds good, so how does that work if all the processing is done by macOS? In a testament to Apple's ridiculous attention to detail, iBoot has a unique boot chime for each machine, with all the relevant multichannel DSP pre-applied to the signal. All the firmware does is power up the speakers, streams these samples, then shuts them off again until XNU brings them back up.

Challenge accepted!

After the initial disappointment of these revelations wore off, we began brainstorming possible solutions. Audio support is obviously a must, but was absolutely not shippable in its default state. An early idea floated by marcan was to sniff around the macOS filesystem and/or memory for the DSP data needed and copy it. This initially seemed promising, however Apple do something weird. Their DSP is not in the standard format of an impulse response, they actually leverage the full suite of CoreAudio plugins, the parameters for which are stored as base64-encoded binary data. It may have been possible to reverse engineer this format in some capacity and extract useful information out of it, but even so Apple have quite an eclectic mix of plugins that don't map particularly well to anything readily available in the Linux world.

But that got me thinking. None of us particularly like the macOS sound profile. Sure, it's impressive and a damn sight better than any other laptop, but it's... too flashy. Apple do all these little tricks for spatial audio, stereo imaging, transfer functions for compensating for where your head is, it's endless. It's a device called Pro that's been engineered to sound like a consumer device. This is not what pros want. Pros want a natural, refined stereo speaker array. A reference system.

So what if we could turn this situation into an opportunity? What if, instead of just copying macOS, we could do something better than macOS? We have a blank canvas on which we can paint anything we want. With the paints and brushes being the Linux audio stack, naturally no one was really that keen to jump into this. At the time, however, I was on a break from uni and looking for something to do.

How Bad Could It Possibly Be?

Next time