You Won't BELIEVE How I Profiled My Noise


Introduction

After my previous post about data augmentation, Philip suggested that noise profiling might be an effective way of augmenting data. The idea is that if we can extract the general background noise from a dataset, then we can synthesize more of it into all of the recordings. This would effectively scramble the last few bits of the data with plausible sounds while retaining the louder events. So I wrote a script for profiling the noise in a dataset, and it occurred to me that this, in its own right, might be a useful part of a suite of tools for soundscape synthesis, so it is getting its own post.

Algorithm

The script is part of the private ambisynth synth utilities.

This is an outline of how it works.

Analysis

The script first analyzes the dataset to find a good noise sample. It does this by looking for the quietest moment in the corpus. I slide a 1-second window over the entire corpus, and for each window I calculate the perceptual loudness, as defined in Section 8.1.1 of this paper, which involves summing the individual loudnesses across each bark band. I define the 'noise profile' to be the 1-second window with the least perceptual loudness.

Synthesis

The script can then generate more background noise by convolving the noise profile with white noise. (As a side note, it might be worth trying a more impulsive type of noise, e.g. Cauchy Edit: I tried Cauchy noise and it didn't behave like I was hoping). This can be done efficiently for a long segment of noise using the overlap-add method, as described, for example here. In the case of data augmentation, the synthesized noise can be mixed into the background of the entire corpus. As a more general soundscape synthesis technique, the noise could be used in its own right as a 'silent' ambient sound (room tone).

Results

I used the script to profile the noise and then synthesize a 30-second sample for each of the 15 classes in DCASE 2016. Here are the results. For each class, I give the original stereo recording where the noise profile came from, the noise profile (a 1-second mono clip from the original recording), 30 seconds of synthesized noise, and a bode plot of the noise profile.

THESE RECORDINGS ARE VERY QUIET BUT SOME OF THEM HAVE SUDDEN LOUD PARTS, SO BE CAREFUL WITH YOUR VOLUME KNOB

Beach



original recording:


noise profile:


synthesized noise:


Example 1: Noise profile of the lakeside beach sounds from DCASE 2016

Bus



original recording:


noise profile:


synthesized noise:


Example 2: Noise profile of the bus sounds from DCASE 2016

Cafe Restaurant



original recording:


noise profile:


synthesized noise:


Example 3: Noise profile of the cafe restaurant sounds from DCASE 2016

Car



original recording:


noise profile:


synthesized noise:


Example 4: Noise profile of the car sounds from DCASE 2016

City Center



original recording:


noise profile:


synthesized noise:


Example 5: Noise profile of the city center sounds from DCASE 2016

Forest Path



original recording:


noise profile:


synthesized noise:


Example 6: Noise profile of the forest path sounds from DCASE 2016

Grocery Store



original recording:


noise profile:


synthesized noise:


Example 7: Noise profile of the grocery store sounds from DCASE 2016

Home



original recording:


noise profile:


synthesized noise:


Example 8: Noise profile of the home sounds from DCASE 2016

Library



original recording:


noise profile:


synthesized noise:


Example 9: Noise profile of the library sounds from DCASE 2016

Metro Station



original recording:


noise profile:


synthesized noise:


Example 10: Noise profile of the metro station sounds from DCASE 2016

Office



original recording:


noise profile:


synthesized noise:


Example 11: Noise profile of the office sounds from DCASE 2016

Park



original recording:


noise profile:


synthesized noise:


Example 12: Noise profile of the park sounds from DCASE 2016

Residential Area



original recording:


noise profile:


synthesized noise:


Example 13: Noise profile of the residential area sounds from DCASE 2016

Train



original recording:


noise profile:


synthesized noise:


Example 14: Noise profile of the train sounds from DCASE 2016

Tram



original recording:


noise profile:


synthesized noise:


Example 15: Noise profile of the tram sounds from DCASE 2016

Night



noise profile:


synthesized noise:


Example 15: Noise profile of the night sounds from RPPTv

Observations

This method of finding a noise profile works better in some settings than others. In the case of 'beach' and 'cafe', it sounds pretty good. In 'residential area', there is a bird chirp right in the middle of the noise profile, which can be clearly seen in the bode plot. This colors the resulting noise in an unintended way (the bird sound effectively get smeared across the noise). There might be a better way of selecting the noise profile, or Cepstral filtering of the profile might be used to remove this type of artifact.

Future Work

I am in the process of revisiting data augmentation using this and other techniques. More on that soon.

Comments

  1. Michael, you write some incredibly interesting posts and the research behind this is fascinating.
    I have to admit that I went off to find out more about Cepstral filtering and there appears to be a large body of research out there for speech processing-machine learning mostly, and quite a few other related projects.
    Thanks Michael, am very much looking forward to your future work!!

    ReplyDelete

Post a Comment

Popular posts from this blog

Ambisonic Rendering in the Story Bubble

How I calibrated my contact microphone

WaveRNN