07 August 2016
I’m making incremental progress on my listening robot project. The ultimate goal is to build a little machine capable of listening to its surroundings, learning from the sound it hears, and then - later - being able to determine its location based upon what it hears. A reasonable prototype robot might be able to remember 5-10 locations, and then tell those locations apart when asked to do so.
There are a number of ways of approaching the software part of the project. I’ve chosen to go down a classical, supervised machine learning route - extracting features from sound, then using them to fit a machine learning model. In essence, this means taking raw sound, and noticing particular qualities of the sound. These qualities are associated with a label - say the name of the location - and used to train a machine learning model.
My work in the last week has been to implement simple feature extraction techniques demonstrated in off-the-shelf audio information extraction libraries Librosa and pyAudioAnalysis. I have been passing in short snippets of music into their feature extraction functions, to output specific features of that sound such as the spectral centroid; a measure of the “brightness of a sound”, and the spectral contrast; a measure of the difference between peaks and valleys in the sound.
In software, I’m implementing feature extractor classes that can be passed some sound, returning feature representations of the sound. I’m trying to get the structure of this right, up front, so I don’t have a lot of refactoring and noodling to do later. I’m also trying to be somewhat agnostic of the underlying library, so that they can be mixed/swapped as needs be.
I’m getting close to just piping feature data into some dumb classifier; getting to a crude proof of concept. I’m not sure whether information extraction techniques more geared towards understanding music are useful for understanding ambient sound - if they’re not, then I may need to adopt another approach.