Granular Synthesis

- August 15, 2018

Introduction

I wanted to compare some of the sophisticated state-of-the-art soundscape synthesis methods we are using to some more traditional synthesis techniques. The obvious choice was Granular synthesis. With a sufficiently large grain size, we should be able to pick bits and bobs out of a corpus of recordings and piece them together into a sort of patchwork pastiche.

I couldn't find a granular synthesizer that would take an entire corpus of recordings, and I couldn't find one that could be scripted to quickly output recordings made with different corpora. And since I will gladly spend several hours doing something that will save me a few minutes of work later, I spent my Saturday evening writing one:

~~https://github.com/michaelkrzyzaniak/granular_synth~~
Second Edit: I moved this to an even newer *private* repository with the other ambisynth utilities (with some updated features):
The script is part of the private ambisynth synth utilities.

Some benefits and drawbacks of this method compared to the machine learning methods are as follows.

Benefits of Granular Synthesis

easily runs in realtime
can work with high sample-rate and bit depth
can crossfade between scenes by choosing grains from 2 corpora with varying probabilities
does not require GPU (can easily run on a phone)
does not require lengthy training
could easily run in a web browser with a nice interface
handles multichannel (but probably increases the overall sound density as a side-effect)

Drawbacks of Granular Synthesis

not as sexy as machine learning
the model will have to include all of the source audio, potentially making it very large
lacks continuity, e.g. you might hear a noisy car pop in to existence and spontaneously disappear after half a second
one of 2 things happens
- the grains are all the same duration and evenly placed making the resulting texture have constant density, but it starts to sound rhythmic and periodic
- there is variability in the grain spacing and placement, which masks the periodicity but creates notably uneven density
repetitions might become apparent, e.g. if there is a notable bird sound in the corpus, you might notice it repeated exactly if you listen for a long time

Results

I used my code to generate a few 30-second samples from each of the 15 classes in DCASE 2016. I used these settings:

mean_grain_length = 3.598934 seconds
grain_length_std_dev = 0.5 seconds
mean_grain_spacing = 0.948866 seconds
grain_spacing_std_dev = 0.2 seconds
window_function = trapezoidal
trapezoidal_fade_time = 0.349977 seconds

These were the results.

id	class	audio
001	beach
002	bus
003	cafe_restaurant
004	car
005	city_center
006	forest_path
007	grocery_store
008	home
009	library
010	metro_station
011	office
012	park
013	residential_area
014	train
015	tram

Example 1: Audio made by Granular Synthesis using each of the 15 classes from DCASE 2016 as the source material.

Future Work

I want to plug recordings like these in to my listening tests so we can see how they compare to other methods.

Comments

UnknownAugust 17, 2018 at 1:43 PM
Michael, a very interesting post, thank you.
Listened to the sample files and I am impressed by the results.
August is a bad time as so many people are on holiday right now, but I will get some of our sound designers to check these out with your listening tests.
Once again thanks for the post, really good.
ReplyDelete
Replies
AnonymousAugust 17, 2018 at 2:33 PM
These will make a very useful baseline for comparison with the "sexy" machine learning techniques, which have quite different kinds of artefact. Not bad for an evening's work!
ReplyDelete
Replies
willbAugust 20, 2018 at 3:47 AM
Some are pretty impressive Michael, thank you.
ReplyDelete
Replies

Add comment

Search This Blog

Ambisynth