Yong's Experiments

A previous employee of CVSSP that I never met, Dr. Yong Xu, was working a bit on soundscape synthesis using SampleRNN. For posterity, here are the contents of a Power Point that I obtained from him.



More audio generation demos for different acoustic scene classes


Demo



Generated restaurant/cafe audio

Conclusions:
  1. i-vector is more stable than the one-hot vector
  2. the quality of the generated audio is better




Generated beach audio

Successfully generated the audio!!!




Generated park audio
Some bird song is generated


Successfully generated the audio!!!




Generated cafe/restaurant audio
Some human talking bubble sound and glass colliding is generated


Successfully generated the audio!!!


Compared with the piano/speech generation using sampleRNN:
Audio is more difficult to generate, negative log-likelihood: 2.8 for audio VS 1.0 for piano VS 1.0 for speech

Generated piano

Successfully generated the audio!!!




Unfortunately I don't have much additional contextual information about how these were made. I know that he was working with this implementation, and was using the DCASE 2016 task 1 dataset. I also know that he was working to modify SampleRNN to accommodate different categories of soundscape via global conditioning, but a former colleague of his told me that these were generated by training individual models each accommodating only one type of soundscape.

Comments

Popular posts from this blog

WaveRNN

How I calibrated my contact microphone

Ambisonic Rendering in the Story Bubble