Posts

WaveRNN

Image
Figure 1: CVSSP's Pet Wyvern. Introduction I've been thinking about how we can make audio synthesis faster. In particular, I would be interested in realtime soundscape synthesis, partially because I think it would be good for the project, and partly because it would be well aligned with my own personal research goals. I found two options: Parallel Wavenet For a while, I was looking at parallel wavenet . This can supposedly generate samples in realtime, and is now used in the Google Assistant. However, I have been unable to find a good, vanilla implementation of it, and the paper is really sparse on details. There are some projects on GitHub where people have started implementing it more than 6 months ago and have not finished, so given the time constraints of this project, implementing it myself doesn't seem feasible. Moreover, the training process is very intricate and involves training a couple of networks separately and then together -- which makes it really har...

Audio by the Meter

Image
Figure 1: Pinot Gallizio, 'Industrial Painting' Introduction After meeting with Will and Philip, I thought it would be a good idea to start trying to run some machine learning on RPPtv web servers, at least as a proof of concept. As an starting point, I thought it would be nice if people could go to a web page, enter how much audio they want to generate, and then be served a wav file. So I made the simplest page possible, just to try to get the pipeline flowing. It generates lakeside sounds from a model trained by wavenet. The user can only specify the length of the recording. In my tests it takes about a minute and a half to generate 1 second of audio. Installation The code for this page lives in the ambisynth private repository . The main page is just simple html. On submit, the audio is generated by a CGI script written in python. The python script expects there to be a virtual environment where TensorFlow and librosa are installed. The repository contains an instal...

This Web Page Will Send You on the Adventure of a Lifetime - Find Out How

Image
Figure 1: Interface for the synthesizer. Introduction The granular synthesizer I made for a previous post generated some interest, so I made a web-based version so it could be used more easily. I am not going to post a link to it here, because we might try to monetize it later, but I am creating this post as a way of documenting my work. The code for this demo is in a private repository . Example 1: Audio generated by the synthesizer with the default settings. Theory of Operation The granular synthesizer loads in a corpus of recordings. It then randomly chooses small pieces (grains) out of the recordings, fades the pieces in and out, and and pastes the pieces randomly into the output audio stream, potentially with the pieces overlapping one another. There is a good basic description of granular synth here . The pieces are inherently mono; if a piece comes out of a multichannel recording, then the piece is taken from one randomly selected channel. If the output strea...

You Won't BELIEVE How I Profiled My Noise

Image
Introduction After my previous post about data augmentation, Philip suggested that noise profiling might be an effective way of augmenting data. The idea is that if we can extract the general background noise from a dataset, then we can synthesize more of it into all of the recordings. This would effectively scramble the last few bits of the data with plausible sounds while retaining the louder events. So I wrote a script for profiling the noise in a dataset, and it occurred to me that this, in its own right, might be a useful part of a suite of tools for soundscape synthesis, so it is getting its own post. Algorithm The script is part of the private ambisynth synth utilities . This is an outline of how it works. Analysis The script first analyzes the dataset to find a good noise sample. It does this by looking for the quietest moment in the corpus. I slide a 1-second window over the entire corpus, and for each window I calculate the perceptual loudness, as defined in Sect...

Easy Lifehacks to Augment your Data

Image
Introduction In a previous post , I examined how little data we need to train a model. The full dataset was 52 minutes long. The next logical question was whether we could get better results with less data by artificially augmenting a smaller dataset. Previous Work I found a paper called Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification . It presents a few simple methods for augmenting the type of data that we have, and demonstrates that they work. In particular, pitch shifting seems to work well, as does 'background-noise', which involves mixing in another soundscape at low amplitude. Datasets NB: I removed the results that were previously posted here because I mistakenly trained with the wrong sample rate, which makes them incomparable to the previous experiment. In the previous experiment , it seemed like 45 minutes of data was roughly where the network started making plausible cafe sounds. So In this experiment I started ...

Granular Synthesis

Image
Introduction I wanted to compare some of the sophisticated state-of-the-art soundscape synthesis methods we are using to some more traditional synthesis techniques. The obvious choice was Granular synthesis . With a sufficiently large grain size, we should be able to pick bits and bobs out of a corpus of recordings and piece them together into a sort of patchwork pastiche. I couldn't find a granular synthesizer that would take an entire corpus of recordings, and I couldn't find one that could be scripted to quickly output recordings made with different corpora. And since I will gladly spend several hours doing something that will save me a few minutes of work later, I spent my Saturday evening writing one: https://github.com/michaelkrzyzaniak/granular_synth Second Edit: I moved this to an even newer *private* repository with the other ambisynth utilities (with some updated features): The script is part of the private ambisynth synth utilities . Some benefits and dra...

Listening Tests

Image
I started making a series of listening tests. These might eventually help us figure out e.g. the minimum acceptable audio sample-rate, what words people would use to search for sounds, or more generally if people find certain synthesis algorithms to be better than others, or as good as real recordings. They contain placeholder audio files right now. But I'm putting them here a) for feedback, and b) to facilitate focusing the discussion surrounding project outcomes. The code for the listening tests has its own repository: https://github.com/michaelkrzyzaniak/Listening_Tests/tree/master Classification Take this test here Figure 1: Screenshots of the classification test and its results. Discrimination Take this test here Figure 2: Screenshots of the discrimination test and its results. Ranking Take this test here Figure 3: Screenshots of the ranking test. Word Cloud Take this test here Figure 4: Screenshots of the word cloud test and its re...