Return of the Single Classes (Sample RNN)
Introduction After some disappointing results on single-class WaveRNN models , I thought I should repeat the same experiment with a different architecture. Everybody keeps asking why Yong was getting such better results . He was using SampleRNN, and the reason I have been avoiding that is because, even though all of the architectures are painfully slow at generating audio, SampleRNN is the only one where I haven't seen any work that suggests that it might eventually be made faster. It is difficult for me to imagine a viable service where people have to wait several hours to generate a few minutes of 8-bit audio; at that point it might make more sense to generate the audio in advance that people can download as needed. If we are going to do that, why don't we just put an internet-connected microphone in the woods and stream high-quality wav files to our servers? Without the ability for a sound-designer to adjust the synthesis parameters, manipulate the model in realtime, blen...