31 July 2016
I’m working on a new information retrieval and machine learning project - but unlike previous projects that involved large amounts of text, this project involves sound.
My goal is to create a small, self-contained robot that can listen to, and learn from its surroundings - building up sonic fingerprints of a number of locations. Later, it should be possible for the robot to tell what location it’s at, simply by listening, and comparing it what it hears to previous experiences.
I’m just getting started with the project, and my focus is on writing simple software that can record sound, extract features, fit machine learning models, then classify previously unheard sound.
The field of audio/music information retrieval is very well-developed, so I have a lot to read and learn from.
As a simple hello-world, I took two existing sound files; a Mozart piano sonata, and a recent track by Rihanna, and visualized them as chromagrams. Visually, the difference between the two tracks is clear, and this might be a clue that the notes used in a piece of music represent useful features for a machine learning model. I’ve yet to discover if this is the case, or if chromatic features are useful for more ambient sound.
It’s worth noting that, for now, I’m foregoing any thought of using deep learning techniques to create vector representations of sound, for use in classification tasks. I might try to tackle this down the road. Crawl, walk, run etc.
I’ll write up what I find as the project continues.