Yong's Experiments
A previous employee of CVSSP that I never met, Dr. Yong Xu, was working a bit on soundscape synthesis using SampleRNN. For posterity, here are the contents of a Power Point that I obtained from him.
More audio generation demos for different acoustic scene classes
Generated restaurant/cafe audio
Conclusions:
Generated beach audio
Successfully generated the audio!!!
Generated park audio
Some bird song is generated
Successfully generated the audio!!!
Generated cafe/restaurant audio
Some human talking bubble sound and glass colliding is generated
Successfully generated the audio!!!
Compared with the piano/speech generation using sampleRNN:
Audio is more difficult to generate, negative log-likelihood: 2.8 for audio VS 1.0 for piano VS 1.0 for speech
Generated piano
Successfully generated the audio!!!
Unfortunately I don't have much additional contextual information about how these were made. I know that he was working with this implementation, and was using the DCASE 2016 task 1 dataset. I also know that he was working to modify SampleRNN to accommodate different categories of soundscape via global conditioning, but a former colleague of his told me that these were generated by training individual models each accommodating only one type of soundscape.
More audio generation demos for different acoustic scene classes
Demo
Generated restaurant/cafe audio
Conclusions:
- i-vector is more stable than the one-hot vector
- the quality of the generated audio is better
Generated beach audio
Successfully generated the audio!!!
Generated park audio
Some bird song is generated
Successfully generated the audio!!!
Generated cafe/restaurant audio
Some human talking bubble sound and glass colliding is generated
Successfully generated the audio!!!
Compared with the piano/speech generation using sampleRNN:
Audio is more difficult to generate, negative log-likelihood: 2.8 for audio VS 1.0 for piano VS 1.0 for speech
Generated piano
Successfully generated the audio!!!
Unfortunately I don't have much additional contextual information about how these were made. I know that he was working with this implementation, and was using the DCASE 2016 task 1 dataset. I also know that he was working to modify SampleRNN to accommodate different categories of soundscape via global conditioning, but a former colleague of his told me that these were generated by training individual models each accommodating only one type of soundscape.
Comments
Post a Comment