RPi audio processing
Introduction – I can’t hear a word you’re saying
The RPi has all sorts of topics but when it comes to big audio dynomite, forgedaboutit. When the image and video processing topics were outlined at the onset of the RPi studies, I didn’t give sound processing any thought. However, as the weeks progressed, I began to realize that the RPi could over shadow the Arduino as the platform of choice in the sensor helmet project. One of the key sensors was acoustic level detection that would be used to create heat maps of noise levels.
Enter the RPi and its measly acoustic offering…you have to be kidding me right! Sadly, I’m not. Maybe I had some expectations of my search results that weren’t aligned with reality, but the results should have been more that what came back. When it comes to video and images, everyone it seems has a post about how to do that on a RPi. The few suggestions I did find were focused on add on boards that offered high fidelity. One post even suggested that the RPi wasn’t up to the task of real time audio processing due to it not being a super fast computer, (http://www.drdobbs.com/embedded-systems/slow-fourier-transform/240159088).
I’m sorry but I can’t buy that. I’ve seen the RPi do some amazing things and I’m hard pressed to think it can’t effectively perform advanced audio processing. For that reason I’m going to show how SPEK can be used on a RPi to perform FFT rendering faster than real time. I’ll also go through what SPEK is, how to install it, and a typical example of its use.
Purpose – That deaf, dumb, and blind kid sure plays a mean pinball
Sound processing results are far more revealing about the environment they are sampling. Sound can be used to locate, identify, quantify, and/or isolate objects. It is more powerful than visual systems. This appears to be the reason why computation audio processing skills are highly sought after.
Unfortunately, Crippleware is a big factor why audio processing is not as prevalent as it could be, thanks in large part to the fallout of MP3 misuse and DRM. I’ve seen the audio functions of the PC to record from the sound card removed from newer versions of Windows.
So the statement that the RPi doesn’t have too much to offer in the way of audio is simple untrue. If the RPI can handle real time effects processing for a guitar, it can do FFT, triangulation, or voice recognition.
Detail – Freedom…do you spek it!?
My wife and I had the chance to go to Maui a few months back. We stayed in the upcountry and rented a house away from the crowds. It was much more enjoyable than staying at a resort or some other high density location.
We were able to hear several different types of birds in the morning and she asked me if there was a way to tell what kind of bird they were by the sound.
Instinctively, I began to search for an app on the iphone that could do that. The closest I was able to get was the Merlin Bird ID app from the Cornell Lab of Ornithology (http://www.birds.cornell.edu). But it only covered a narrow range of species and was based on visual clues. I had found another program that used sound as an identifier, but they didn’t offer it to the public.
With nothing to identify the birds, I decided I would try and record them. I didn’t want to used the built in voice memos to do this so I did a search and came across SPEK.
SPEK is an acoustic spectrum analyser that runs on Mac, Windows, or Linux. It will render the FFT of an audio file, which can be saved as an image. Not of much use to my vacation in Maui, but it is useful now. It was simple to install, just enter this.
sudo apt-get install spek
Once it was installed, I copied an audio file to the RPi and ran it. I had to RDP to the RPi and login on the X window. From there I entered the command:
spek SG_TOLBNY.mp3
It chugged away and rendered the file quicker than the duration of the file. The fact that I had to “burden” the RPi by starting the X windows session just goes to show you how robust the RPi is.
I didn’t notice if the RPi had any trouble running under load. Cacti was running the whole time and it didn’t look like a big deal to the RPi.
Based off of the Cacti results, the RPi showed no signs of a bottleneck. There you have it.
Relations – Mushroom Pi is more like it
Now that the ideas about RPi not being able to handle audio processing in realtime has been put the rest, I’d like to know why others aren’t developing this feature set in larger numbers. Come on, this is a great opportunity for the kids to gain entry in a high demand field.
What about the earlier mention of FFTW in the post that mentioned RPi was not a fast computer? I can’t say how well that would perform. Even so, audio processing shouldn’t be presented as a big hurdle for the RPi
Summary – Can’t you hear me knocking?
In this post I showed that the RPi can process encoded audio in faster than real time using SPEK. The audio FFT rendering took less time than the time to listen to the file. The RPi is capable of handing 2 channel 44Khz 16bit audio and render the FFT in a full color spectrum. I then showed how to install and use SPEK, as well as displayed the results from the render.
Using the RPi for audio processing is a reasonable expectation. It can handle it.