Posts

Showing posts from August, 2018

You Won't BELIEVE How I Profiled My Noise

Image
Introduction After my previous post about data augmentation, Philip suggested that noise profiling might be an effective way of augmenting data. The idea is that if we can extract the general background noise from a dataset, then we can synthesize more of it into all of the recordings. This would effectively scramble the last few bits of the data with plausible sounds while retaining the louder events. So I wrote a script for profiling the noise in a dataset, and it occurred to me that this, in its own right, might be a useful part of a suite of tools for soundscape synthesis, so it is getting its own post. Algorithm The script is part of the private ambisynth synth utilities . This is an outline of how it works. Analysis The script first analyzes the dataset to find a good noise sample. It does this by looking for the quietest moment in the corpus. I slide a 1-second window over the entire corpus, and for each window I calculate the perceptual loudness, as defined in Sect

Easy Lifehacks to Augment your Data

Image
Introduction In a previous post , I examined how little data we need to train a model. The full dataset was 52 minutes long. The next logical question was whether we could get better results with less data by artificially augmenting a smaller dataset. Previous Work I found a paper called Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification . It presents a few simple methods for augmenting the type of data that we have, and demonstrates that they work. In particular, pitch shifting seems to work well, as does 'background-noise', which involves mixing in another soundscape at low amplitude. Datasets NB: I removed the results that were previously posted here because I mistakenly trained with the wrong sample rate, which makes them incomparable to the previous experiment. In the previous experiment , it seemed like 45 minutes of data was roughly where the network started making plausible cafe sounds. So In this experiment I started

Granular Synthesis

Image
Introduction I wanted to compare some of the sophisticated state-of-the-art soundscape synthesis methods we are using to some more traditional synthesis techniques. The obvious choice was Granular synthesis . With a sufficiently large grain size, we should be able to pick bits and bobs out of a corpus of recordings and piece them together into a sort of patchwork pastiche. I couldn't find a granular synthesizer that would take an entire corpus of recordings, and I couldn't find one that could be scripted to quickly output recordings made with different corpora. And since I will gladly spend several hours doing something that will save me a few minutes of work later, I spent my Saturday evening writing one: https://github.com/michaelkrzyzaniak/granular_synth Second Edit: I moved this to an even newer *private* repository with the other ambisynth utilities (with some updated features): The script is part of the private ambisynth synth utilities . Some benefits and dra

Listening Tests

Image
I started making a series of listening tests. These might eventually help us figure out e.g. the minimum acceptable audio sample-rate, what words people would use to search for sounds, or more generally if people find certain synthesis algorithms to be better than others, or as good as real recordings. They contain placeholder audio files right now. But I'm putting them here a) for feedback, and b) to facilitate focusing the discussion surrounding project outcomes. The code for the listening tests has its own repository: https://github.com/michaelkrzyzaniak/Listening_Tests/tree/master Classification Take this test here Figure 1: Screenshots of the classification test and its results. Discrimination Take this test here Figure 2: Screenshots of the discrimination test and its results. Ranking Take this test here Figure 3: Screenshots of the ranking test. Word Cloud Take this test here Figure 4: Screenshots of the word cloud test and its re

How Little Data Do We Need to Train SampleRNN?

Image
In my previous experiments , I was training SampleRNN and Wavenet on a 52-minute dataset, and got satisfactory results. However, we wanted to know how the models perform with fewer data. I trained SampleRNN using the Cafe scenes from DCASE 2016 . Reference Samples For reference, here are a couple of representative samples from the original DCASE dataset, with the reduced quality that I used for training (16-bit, 16kHz). Example 1: Audio clips taken from the Cafe dataset Samples I trained 6 separate SampleRNN models with varying amounts of data, each for 100 000 iterations (about 31 hours for each model). By "iterations", I mean that for all models, I set the minibatch size to 52 audio files, and trained for 100 000 minibatches. This means that for the smallest trial, each audio file was presented to the network 100 000 times, and in the largest trial, which has 6 times as many audio files, each file was presented to the network

Minimum computer specs for training our models

Image
A couple of people have built PCs for machine learning, for similar software tools as us. They are reporting running about 20-30x faster than my laptop, which I think would equal very roughly 1.5 days to train our models that take about 1 day on the condor servers. These are their specs: SELF-BUILT $883 https://www.oreilly.com/learning/build-a-super-fast-deep-learning-machine-for-under-1000 GeForce GTX 1060 3GB (should have gotten 6 GB) 1TB SATA drive Intel I5-6600 two x 8GB ASUS Mini ITX DDR4 LGA 1151 B150I PRO GAMING/WIFI/AURA motherboard SELF-BUILT $1700 https://blog.slavv.com/the-1700-great-deep-learning-box-assembly-setup-and-benchmarks-148c5ebe6415 GTX 1080 Ti Intel i5 7500 (maybe should have gotten a CPU with 40 PCIE lanes) 2 sticks of 16 GB 2 TB Seagate HDD 480 GB MyDigital SSD MSI — X99A SLI PLUS motherboard Dell has a PC with similar specs for about the same price, this is probably what I would recommend at this point: Dell $999.99 DELL https://www