Granular Synthesis

Introduction

I wanted to compare some of the sophisticated state-of-the-art soundscape synthesis methods we are using to some more traditional synthesis techniques. The obvious choice was Granular synthesis. With a sufficiently large grain size, we should be able to pick bits and bobs out of a corpus of recordings and piece them together into a sort of patchwork pastiche.

I couldn't find a granular synthesizer that would take an entire corpus of recordings, and I couldn't find one that could be scripted to quickly output recordings made with different corpora. And since I will gladly spend several hours doing something that will save me a few minutes of work later, I spent my Saturday evening writing one:

https://github.com/michaelkrzyzaniak/granular_synth
Second Edit: I moved this to an even newer *private* repository with the other ambisynth utilities (with some updated features):
The script is part of the private ambisynth synth utilities.



Some benefits and drawbacks of this method compared to the machine learning methods are as follows.

Benefits of Granular Synthesis

  • easily runs in realtime
  • can work with high sample-rate and bit depth
  • can crossfade between scenes by choosing grains from 2 corpora with varying probabilities
  • does not require GPU (can easily run on a phone)
  • does not require lengthy training
  • could easily run in a web browser with a nice interface
  • handles multichannel (but probably increases the overall sound density as a side-effect)

Drawbacks of Granular Synthesis

  • not as sexy as machine learning
  • the model will have to include all of the source audio, potentially making it very large
  • lacks continuity, e.g. you might hear a noisy car pop in to existence and spontaneously disappear after half a second
  • one of 2 things happens
    • the grains are all the same duration and evenly placed making the resulting texture have constant density, but it starts to sound rhythmic and periodic
    • there is variability in the grain spacing and placement, which masks the periodicity but creates notably uneven density
  • repetitions might become apparent, e.g. if there is a notable bird sound in the corpus, you might notice it repeated exactly if you listen for a long time

Results

I used my code to generate a few 30-second samples from each of the 15 classes in DCASE 2016. I used these settings:
  • mean_grain_length = 3.598934 seconds
  • grain_length_std_dev = 0.5 seconds
  • mean_grain_spacing = 0.948866 seconds
  • grain_spacing_std_dev = 0.2 seconds
  • window_function = trapezoidal
  • trapezoidal_fade_time = 0.349977 seconds
These were the results.

id class audio
001 beach
002 bus
003 cafe_restaurant
004 car
005 city_center
006 forest_path
007 grocery_store
008 home
009 library
010 metro_station
011 office
012 park
013 residential_area
014 train
015 tram

Example 1: Audio made by Granular Synthesis using each of the 15 classes from DCASE 2016 as the source material.

Future Work

I want to plug recordings like these in to my listening tests so we can see how they compare to other methods.

Comments

  1. Michael, a very interesting post, thank you.
    Listened to the sample files and I am impressed by the results.
    August is a bad time as so many people are on holiday right now, but I will get some of our sound designers to check these out with your listening tests.
    Once again thanks for the post, really good.

    ReplyDelete
  2. These will make a very useful baseline for comparison with the "sexy" machine learning techniques, which have quite different kinds of artefact. Not bad for an evening's work!

    ReplyDelete
  3. Some are pretty impressive Michael, thank you.

    ReplyDelete

Post a Comment

Popular posts from this blog

WaveRNN

How I calibrated my contact microphone

Ambisonic Rendering in the Story Bubble