Bat BioAcoustics

This post continues the discussion of how to perform audio analysis on an iPhone in order to recognise bat calls.  For the previous post, which covers some of the mathematical techniques please see Fourier Transforms on an iPhone.

The last blog posting ended with a very simplistic way to detect a bat call within an audio sample, and a promise to explain some more sophisticated methods.  Whilst researching signal detection algorithms, we were made aware of an existing open source program that does a very similar thing to what we are attempting – Bat BioAcoustics, written by Chris Scott from Leeds University.

BioAcoustics is written in C++, and although it’s a desktop application, a lot of the algorithms in it are portable to any system which has a C++ compiler.  Fortunately, an iPhone is one such system.

In order to analyse and classify bat calls in bat species, BioAcoustics performs the following steps:

  1. Load audio from a WAV file into memory.
  2. Detect each individual bat call within the file.
  3. Extract certain audio features for each call, e.g. maximum frequency,  bandwidth and other features
  4. Use a Support Vector Machine model to classify each call

It has a dependency on the following open source libraries:

  • QT – to provide a desktop GUI toolkit
  • libsndfile – to read audio from a WAV file into memory
  • FFTW – for Fourier analysis
  • libsvm – to run the SVM models

As mentioned in previous blog posts, Apple provide APIs that can perform audio and signal analysis, and they also provide a GUI toolkit for iPhones.  So if we can replace the first three of these libraries – QT, libsndfile and FFTW – with Apples native equivalents, we should be able to port the signal analysis and call matching sections of Bat BioAcoustics to an iPhone.

Finally, BioAcoustics is released under the GPL open source license – which is the same license we will be releasing code from BatMobile under – so there are no legal obstacles to doing this.

Reading Audio Files

The previous two posts, Audio Analysis on an iPhone and Fourier Transforms on an iPhone have covered some of the theory of audio sampling and analysis.  What hasn’t been mentioned, is how audio samples are actually loaded into our program.  There are fundamentally two ways to do this:

  1. Realtime sampling of incoming audio.  For bat calls, this would be the output from a time expanding bat detector.
  2. Loading pre-recorded audio from a file.

Initially we will be using method (2).  As mentioned before, it’s much easier to load samples from a file, rather than make a bat squeak into a microphone on demand!

So we need to read from an audio file on disk  to a list of samples held in an array in our program.  Apple provide tools to do this, in the form of an API called Core Audio.  As with the Accelerate/vDSP framework, this is quite a low level API in C, and it has a reputation for being tricky to use.  Fortunately for us, we only need to use a small part of it that deals with reading and converting audio formats.

Loading Audio

Obtain a reference to the audio file:

// myFile is the filesystem path and filename we are loading
CFStringRef str = CFStringCreateWithCString(
CFURLRef inputFileURL = CFURLCreateWithFileSystemPath(

 ExtAudioFileRef fileRef;
 ExtAudioFileOpenURL(inputFileURL, fileRef);

Transform the audio file into the format we want:

// "sample"  - An instantaneous amplitude of the signal in a 
// single audio channel, represented as an integer, 
// floating-point, or fixed-point number. 

// "channel" - A discrete track of audio. A monaural recording has 
// exactly one channel.

// "frame"   - A set of samples that contains one sample from 
// each channel in an audio data stream

// "packet"  - An encoding-defined unit of audio data comprising 
// one or more frames. For PCM audio, each packet corresponds to 
// one frame.

// A "packet" contains a "frame" which contains 
// "channels" which contain "samples".
// For mono PCM, there is one channel, so one channel per frame, 
// and one frame per packet.  So each packet contains only one 
// sample.

    // Set up audio format we want the data in
    // Each sample is of type Float32
    AudioStreamBasicDescription audioFormat;
    audioFormat.mSampleRate = 44100;
    audioFormat.mFormatID = kAudioFormatLinearPCM;
    audioFormat.mFormatFlags = kLinearPCMFormatFlagIsFloat;
    audioFormat.mBitsPerChannel = sizeof(Float32) * 8;
    audioFormat.mChannelsPerFrame = 1; // Mono
    audioFormat.mBytesPerFrame = audioFormat.mChannelsPerFrame * sizeof(Float32);  // == sizeof(Float32)
    audioFormat.mFramesPerPacket = 1;
    audioFormat.mBytesPerPacket = audioFormat.mFramesPerPacket * audioFormat.mBytesPerFrame; // = sizeof(Float32)

    // 3) Apply audio format to the Extended Audio File
        sizeof (AudioStreamBasicDescription), //= audioFormat

Allocate some space in memory:

    int numSamples = 1024; //How many samples to read in at a time
    UInt32 sizePerPacket = audioFormat.mBytesPerPacket; // = sizeof(Float32) = 32bytes
    UInt32 packetsPerBuffer = numSamples;
    UInt32 outputBufferSize = packetsPerBuffer * sizePerPacket;

    // So the lvalue of outputBuffer is the memory location where we have reserved space
    UInt8 *outputBuffer = (UInt8 *)malloc(sizeof(UInt8 *) * outputBufferSize);

    convertedData.mNumberBuffers = 1;    // Set this to 1 for mono
    convertedData.mBuffers[0].mNumberChannels = audioFormat.mChannelsPerFrame;  //also = 1
    convertedData.mBuffers[0].mDataByteSize = outputBufferSize;
    convertedData.mBuffers[0].mData = outputBuffer; //

And then finally read the audio in:

    UInt32 frameCount = numSamples;
    while (frameCount > 0) {
     if (frameCount > 0)  {
            AudioBuffer audioBuffer = convertedData.mBuffers[0]; 
            float *samplesAsCArray = (float *)audioBuffer.mData; // Cast from the audio buffer to a C style array
            std::vector samplesAsVector;                  // And then to a temporary C++ vector;
            samplesAsVector.assign(samplesAsCArray, samplesAsCArray + frameCount); 
            samples.insert(samples.end(), samplesAsVector.begin(), samplesAsVector.end()); // And then into our final samples vector

Finally after all that, we have arrived with our audio samples as a C++ vector, which we can then analyse with Apples digital signal processing API.

One important caveat is we have the entire audio file uncompressed in memory. This is fine for short recordings (< 1min), for longer recordings we would need a way to process the file in sections.