Audio Analysis on an iPhone

Of the four main mobile platforms (Android, iOS, Blackberry and Windows Mobile) we have chosen to target iOS.  This was chosen due to the expertise of the developers on the project, and because iOS provides a native programming interface (API) for performing signal processing and linear algebra calculations.

In A picture of the challenge ahead you can see the main steps involved in identifying a bat call.  The 1st step major step in this is “(3) -Enable call isolation”

Enable call isolation

Fundamentally, an audio file is simply a long list of numbers specifying the amplitude of a sound wave over time.  If you imagine a sound wave moving through the air and hitting the magnet in a microphone, the amplitude is the distance from the origin that the magnet moves.

This is an analogue signal.  In order for a computer to process it, it needs to be converted to a digital representation.  This is called “sampling”, and is a measurement of the position of the microphone taken many times a second.  A theorem called the Nyquist sampling theorem states that in order to sample a signal of X Hz without significant loss of quality, you need to sample at 2X the frequency.  The limit of human hearing is approximately 20kHz, which hence requires a sample rate of approximately 40Khz.  This is why CDs are sampled at 44Khz.  i.e. each second of recording in a CD contains 44,000 measurements of the highest possible frequency contained in the recording.

A 10x “time expansion” bat detector would record 1sec of audio and play this back over 10sec.  For a Pipistrelle this would reduce it’s frequency from about 50kHz to 5Khz, which can be recorded by ordinary equipment with a sample rate of 44.1kHz.  Without the time expansion we would require specialist equipment that could sample at 100kHz.

Once we have our recording, we need to identify where within it there are actual bat calls.  We have a measurement of audio amplitude against time, which isn’t very easy to analyse.  We want to identify where there are certain frequency characteristics within this audio.  To convert from a time domain to a frequency domain we need to perform a Fourier transform.  Because we have digitally sampled data, we need a discrete Fourier transform, and we will use a quick form of this called a fast Fourier  transform (FFT).  Apple’s iOS provides a native API for performing FFTs and other linear alegebra.  We will be using this to process the time expanded bat calls and detect where within the audio there is a call.

The next blog posting will go into this in detail.