Comparisons of wavenet and sampleRNN using DCASE 2016 Lakeside
Before doing anything too wild, I wanted to try to reproduce some of Yong's results , and explore them a little more. In particular I wanted to train wavenet and sampleRNN on the "Lakeside beach (outdoor)" scenes from the DCASE 2016 Task 1 dataset (which I will henceforth call the "Lakeside dataset"), just as I did with the Beethoven dataset before. The Lakeside dataset contains 312 10-second audio clips, totaling 52 minutes of audio. Reference Samples For reference, here are a couple of representative samples from the original DCASE dataset, with the reduced quality that I used for training (16-bit, 16kHz). Example 1: Audio clips taken from the Lakeside dataset Also for reference, here again is the lakeside clip that Yong generated with SampleRNN Example 2: Yong's generated sample SampleRNN The DCASE 2016 audio files are broken into 10 second segments. SampleRNN, by default, breaks longer f