Paul May

More Audio Feature Extraction

07 August 2016

I’m making incremental progress on my listening robot project. The ultimate goal is to build a little machine capable of listening to its surroundings, learning from the sound it hears, and then - later - being able to determine its location based upon what it hears. A reasonable prototype robot might be able to remember 5-10 locations, and then tell those locations apart when asked to do so.

There are a number of ways of approaching the software part of the project. I’ve chosen to go down a classical, supervised machine learning route - extracting features from sound, then using them to fit a machine learning model. In essence, this means taking raw sound, and noticing particular qualities of the sound. These qualities are associated with a label - say the name of the location - and used to train a machine learning model.

My work in the last week has been to implement simple feature extraction techniques demonstrated in off-the-shelf audio information extraction libraries Librosa and pyAudioAnalysis. I have been passing in short snippets of music into their feature extraction functions, to output specific features of that sound such as the spectral centroid; a measure of the “brightness of a sound”, and the spectral contrast; a measure of the difference between peaks and valleys in the sound.

In software, I’m implementing feature extractor classes that can be passed some sound, returning feature representations of the sound. I’m trying to get the structure of this right, up front, so I don’t have a lot of refactoring and noodling to do later. I’m also trying to be somewhat agnostic of the underlying library, so that they can be mixed/swapped as needs be.

I’m getting close to just piping feature data into some dumb classifier; getting to a crude proof of concept. I’m not sure whether information extraction techniques more geared towards understanding music are useful for understanding ambient sound - if they’re not, then I may need to adopt another approach.

Paul May is a researcher, interaction designer, and technologist from Dublin, Ireland. He is currently working with Memorial Sloan Kettering Cancer Center on smart health applications.

Extracting Information from Sound

31 July 2016

I’m working on a new information retrieval and machine learning project - but unlike previous projects that involved large amounts of text, this project involves sound.

My goal is to create a small, self-contained robot that can listen to, and learn from its surroundings - building up sonic fingerprints of a number of locations. Later, it should be possible for the robot to tell what location it’s at, simply by listening, and comparing it what it hears to previous experiences.

I’m just getting started with the project, and my focus is on writing simple software that can record sound, extract features, fit machine learning models, then classify previously unheard sound.

The field of audio/music information retrieval is very well-developed, so I have a lot to read and learn from.

As a simple hello-world, I took two existing sound files; a Mozart piano sonata, and a recent track by Rihanna, and visualized them as chromagrams. Visually, the difference between the two tracks is clear, and this might be a clue that the notes used in a piece of music represent useful features for a machine learning model. I’ve yet to discover if this is the case, or if chromatic features are useful for more ambient sound.

It’s worth noting that, for now, I’m foregoing any thought of using deep learning techniques to create vector representations of sound, for use in classification tasks. I might try to tackle this down the road. Crawl, walk, run etc.

I’ll write up what I find as the project continues.


05 September 2014

It’s been a very busy, and very important few weeks. Last week I had the great honour to watch my brother graduate from University of Chicago with a PhD in Anthropology. I am so proud of him. It takes a lot of hard work to tackle a complex project spanning many years, but he did it.

After the graduation I spent some time in New York, catching up with the team at Sloan Kettering. Our projects have been challenging, and not without some major bumps in the road, but this week things became simpler, and better. The feedback from the eventual users of our work is very positive. We’re definitely on a fruitful path, and we’ve been given permission to make our work as good as it needs to be. It feels like this week marked the crossing of some important threshold.

I also got to spend some time with my New York friends, who I miss quite a bit. It’s so cool to be able to come back, spend time with them, and hear about all their lives and their work. They’re building new devices, designing interfaces, writing software, building automata, writing about technology, and making all sorts of things work a little better. They are all so eager to learn, make, and understand more. I feel really refreshed and inspired, because of them.

Also this week; some perspective.

Throughout the universe, galaxies tend to clump together in massive structures that astronomers call superclusters. According to the new map, Earth’s galaxy lives near the edge of the Laniakea supercluster, which measures 500 million light-years in diameter and includes roughly 100,000 galaxies.

In a few weeks, me and Cliona will introduce a new little person into this sea of infinity. I am excited to meet whomever arrives.