Yong's Experiments

- July 26, 2018

A previous employee of CVSSP that I never met, Dr. Yong Xu, was working a bit on soundscape synthesis using SampleRNN. For posterity, here are the contents of a Power Point that I obtained from him.

More audio generation demos for different acoustic scene classes

Demo

Generated restaurant/cafe audio

Conclusions:

i-vector is more stable than the one-hot vector
the quality of the generated audio is better

Generated beach audio

Successfully generated the audio!!!

Generated park audio
Some bird song is generated

Successfully generated the audio!!!

Generated cafe/restaurant audio
Some human talking bubble sound and glass colliding is generated

Successfully generated the audio!!!

Compared with the piano/speech generation using sampleRNN:
Audio is more difficult to generate, negative log-likelihood: 2.8 for audio VS 1.0 for piano VS 1.0 for speech

Generated piano

Successfully generated the audio!!!

Unfortunately I don't have much additional contextual information about how these were made. I know that he was working with this implementation, and was using the DCASE 2016 task 1 dataset. I also know that he was working to modify SampleRNN to accommodate different categories of soundscape via global conditioning, but a former colleague of his told me that these were generated by training individual models each accommodating only one type of soundscape.

Search This Blog

Ambisynth

Yong's Experiments

Demo

Comments

Post a Comment

Popular posts from this blog

Ambisonic Rendering in the Story Bubble

How I calibrated my contact microphone

WaveRNN