These 5 definitions of audio augmented reality will blow your mind!

A taxonomy of audio augmented reality with examples.

Introduction

With the new year, I am working on a new project involving audio augmented reality (AAR). I started doing a literature review on the topic, and it turns out that I'm not even sure what the term "audio augmented reality" means. So I started grouping the projects I have seen, and I have identified at least 5 distinct situations that have been or could be, in my estimation, called AAR. These are as follows (with examples):
  1. Enchanting silent physical objects with digital sound
    • Enchanted textiles
    • Enchanted paper / books / maps
    • Enchanted footballs / sports equipment
  2. Overlay of extra audio information onto the real world
    • sat nav
    • self-guided in-ear museum tours
  3. Digital sound-objects placed in real 3d space
    • Sound attached to virtual objects in AR Games
    • Geolocated narrative
    • Geolocated music playlists
  4. Realtime digital modification of acoustic sounds
    • In-ear translation
    • Electric guitar distortion pedals
    • Table drums (analysis and resynthesis)
  5. Telepresence — audio merging of remote and local locations
    • Telephone
    • Treadmill sound walks
    • Telematic / networked musical ensembles
These are not completely mutually exclusive -- the Enchanted objects category in particular seems to overlap with nearly everything, yet there are still projects that clearly belong there and not anywhere else, and there are other projects that definitely do not belong there. I'm not sure that the list is comprehensive, but I don't know of a counterexample at this point. I do not clearly understand the precise difference between augmented, virtual, and mixed reality, particularly concerning audio, and surely someone would disagree with some of the things I have included here. Nonetheless, I will try to give some more specific examples from each category in the hopes that a suitable definition of AAR will emerge, even if by brute force; this is intended to be a descriptive definition, not a prescriptive one.

Update: I thought of a counterexample. Karaoke. This has to do with the blending of digital and physical sounds. It might be that Digital sound-objects placed in real 3d space is really about blending digital and physical, and there is a located and non-located form.

Enchanting silent physical objects with digital sound

Adding meaningful sound to an object that would not otherwise make meaningful sound.

Textiles

I have a digital textile project where I am trying to turn clothing into musical instruments. There have been several similar projects, like musical hoodies, and many musical gloves (Mi.mu, This thing, Lady Gloves, etc...), and other audio-augmented clothing and fabric.


Figure : Conceptual clothing with sensors knit into the fabric, used as a drum pad personal musical instrument.

The Human Body

Arguably, musical gloves fall into this category -- is the glove being augmented, or the body and its gestures? Moreover, many interactive or responsive dance works, for example more or less the entire output of MoCo, attempt to turn a dancer's body into a musical instrument. For example, here is a dance work where the dancers wear sensors and are tracked with overhead cameras, and their movement controls the audio. Inso far as the body is a musical instrument, it is being enchanted with sound.


Figure :

Sporting Equipment

There is an increasing amount of technologically augmented sporting equipment on the market, and because a player's eyes are busy while playing, many of these communicate to the user via sound. I am involved with a company called Inside Coach that is trying to use sound to encourage youth football players to practice more by making it fun. So we augmented a football with sound. Here is a demo that uses sound to help players improve their timing and ball-control.


Figure : A football augmented with sound, where the sound is supposed to encourage people to practice longer and help then gain the specific skill of improving their timing and ball-control

Books

There have been a number of AAR book and map projects. David Frohlich's Audiophotography project consists of photographs plus sound.


Figure : David Frohlich's Audiophotography project

There are very many projects that use conductive ink to turn paper into musical instruments, for example this one, chosen randomly from Youtube:


Figure : Musical Paper

I would swear that Leah Buechley's Chibitronics used to have sound-producing elements, but perhaps I am mistaken.

Maps

Sound maps are common enough that they have their own Wikipedia page, and as I mention later, there is a list of nearly 100 of them Brandon Metchley's Dissertation. These projects usually place recordings on a digital map, so users can hear what different locations sound like. Here is a randomly-chosen one.


Figure : Sonic map from the BBC.

Sonic maps are related to sound walks, which I have placed under Telepresence — audio merging of remote and local locations, because to me sound maps seem like they are aimed at augmenting the map itself with audio, whereas sound walks seem like they are more focused on immersing the listener in a remote location, but there is clearly some category overlap here.

Arbitrary Objects

There are many projects that attempt to imbue arbitrary objects with musical sound, such as this one. Perhaps tabletop drum kits and Mogees belong in this category as well, although I have placed them under Realtime digital modification of acoustic sounds because those involve items make sound on their own, which is intelligently modified or replaced with digital sound.

Overlay of extra audio information onto the real world

Using audio to provide a user with additional information that is relevant to their current activity.

Navigational Aids

GPS navigation is perhaps the archetypical example of AAR information overlaid on the real world. The information is salient to your current location and situation, but the audio is not affixed to locations in real space. Perhaps to a certain degree this could be seen as an audio enchanted object (car / dashboard).

Home appliances

Your microwave oven (washer-dryer, coffee maker, etc) most likely uses digital sound to give you information about the real world, i.e. it is finished vitiating your food. To me this seems like it shouldn't qualify as AAR, but I can't think of a good reason why not. Perhaps because the appliance is really just giving information about the state of the appliance, or perhaps because the sound source can't be taken with you and used to give information about other things.

In-ear museum tours

Many museums will give you an earpiece to carry around, and when you are in the vicinity of an artwork, you will hear historical information about the work.


Figure : The Musical Instrument Museum in Phoenix Arizona has the best in-ear audio tour of all time anywhere, period.

This is information overlay, and it also has elements of the next category, Digital sound-objects placed in real 3d space, but in this case the sound is not affixed to an exact location or object -- e.g. the sound is not meant to be emanating from the painting, so it seems to me more like information overlay that is triggered by proximity than a true placement of a virtual sound object in real space. An interesting case is the musical instrument museum in Phoenix, which houses many exotic instruments, and you can hear performances on them through the earpiece. This further blurs the distinction between these categories.

Chatbots

Chatbots like Siri, Alexa, the Google Assistant, and a surprisingly large number of conversational Japanese fembots can be used to overlay audio information onto the real world. "OK, Google, should I turn left or right. Should I get out of bed or go back to sleep?"

Digital sound-objects placed in real 3d space

The treatment of digital sound as an ontological token, and placing it at a definite place in the physical world.

Geolocated Playlists

Someone suggested that with AAR, people might start leaving their favorite songs or playlists at specific locations, so you might have go to a specific location to hear a specific song. In the future, perhaps musicians will release geolocated albums or songs.

Art / Museum Projects

There are a number of public augmented reality art projects that involve sound, where a user has to go to a particular location to hear some audio. Most of these are paired with visual AR, and use a smartphone as a viewport / audioport. For example, here is an AR museum project in a train station that depicts historical events -- you must be in the station to see the visuals and hear the audio.


Figure : Geolocated museum project found here.

I remember another particularly nice one that I thought took place at the Basel SBB, and was presented at NIME 2014 (the documentation video featured a Swiss German *attempting* to speak high German). There was, amongst other things, a virtual woman sitting on a real bench, and you could sit next to her, and she would start talking to you. I can no longer find it. Disney has a similar demo, involving a cartoon elephant. In their conception: "Our mantra for this project was: hear a character coming, see them enter the space ...", so again the virtual sound is co-located with the virtual image, which are both attached to a location in real space.


Figure : Disney AR elephant. Photo stolen from an article about it

Gaming

No discussion of AR would be complete without at least mentioning Pokemon Go. This game places virtual objects in the real world. Some of those objects have sounds attached to them. In general, such sounds in an AR game might not be stationary, but are nonetheless intended to be positioned in real 3d space as the character moves.


Figure : Pokemon Go, with virtual items and their associated sounds placed in real 3d space.


Realtime digital modification of acoustic sounds

"We define audio augmented reality as realtime computational mediation of sound..." source

Guitar Pedals

Electric guitars, on their own, tend to sound like crap, and are in some ways the ideal vehicle for all sorts of realtime digital effects processing, filters, reverb, harmonic distortion, delay loops, etc. The real sound of the guitar is augmented by computation.


Figure : A guitar pedal is used for augmenting the sound of an electric guitar.

Augmented acoustic musical instruments

Within the computer music community, there exists an entire practice of augmenting the natural sound of regular musical instruments with digital effects processing. For example a medieval tromba "augmented with a pickup, speakers and digital signal processing".


Figure : digitally augmented tromba

Many of these also use a variety of sensors and buttons, such that the instrument can be used as a general-purpose digital control interface, either to control the audio effects parameters, or to control purely synthesized sounds, or to control lighting or other things. Here is an augmented bass clarinet.


Figure : digitally augmented bass clarinet

Here is an augmented saxophone.


Figure : digitally augmented sax

There are metapapers on the topic, etc, etc... In many ways, these should be in the enchanted objects category, as they are regular objects that have been enchanted with digital technology. However, I have put them here because, in contrast to the other enchanted objects, these objects already make sound, and that sound is modified by technology. The other objects in the enchanted object category do not make meaningful sound to begin with -- they are silent objects that are enchanted by virtue of the addition of sound.

In-ear Translation

The google translate app now has all sorts of AR capabilities. On the visual side, it will overlay a translation of text on real text as seen through a phone's camera. For audio, and by analogy, it will translate speech (or sign-language) to audio in real-time. This goes a step beyond guitar pedals and augmented musical instruments insofar as here the entire audio is stripped out and replaced, not just modified with DSP.

tabletop drumkits

There are several projects that attempt to turn tabletops into drum kits by processing (like a guitar pedal) or replacing (like in-ear translation) the sound of tapping on the table. The most sophisticated of theses use some machine learning to identify the timbre of each stroke on the table, so that it can be replaced by a the sound of some instrument that is particular to that stroke. One such project is Mogees:


Figure : Mogees

Here is another one:


Figure : Mogees

Again, these might just as well fall under the enchanted objects category, but I have put them here since the idea here is to modify the sound of something that already makes sound, rather than to add sound to something that is otherwise silent. However, this is more of an edge-case as the sound of the tabletop by itself is not that interesting without augmentation.

Telepresence — audio merging of remote and local locations

Using sound to give people the experience of being in a remote location in the real world.

Telephone

Skype, Facetime. These all use sound to transport people out of their living room and into someone else's. To me, television, even live television, does not feel like it qualifies as AAR, perhaps because the person who is being transported does not have the experience of being transported; they do not interact with anything in the remote location, nor are they even aware of exactly where they are being transported.

Sound Walks

There are many projects that attempt to use audio to give someone the experience of walking around some location other than where they are. Some of these are attached to maps, which makes them related to enchanted objects (enchanted maps) although I think these goes beyond the concept of map-as-enchanted-object insofar as they really try to place a user in a remote space, with varying degrees of agency and embodiment. Here is such a project from Brandon Metchley's Dissertation, which also contains a list of nearly 100 similar sound walk and sonic cartography projects.


Figure : Sound Walk from Brandon Metchley's Dissertation.

Here is another sound walk project by Grisha Coleman, Daragh Byrne, David Tinapple, Matthew Mosher, et al. They recorded (sound and video) themselves walking or driving around, and users watch and listen to these recordings while on a treadmill, and the rate of playback depends on the rate of walking. Maybe in some sense this too is more like an enchanted object (treadmill) with a telepresence flavour. In some respects, this is the most clearly 'augmented' (as opposed to 'virtual') telepresence example, because the virtual walk is overlaid on the real activity of walking.


Figure : Augmented treadmill with video and audio. By Grisha Coleman et al (image by Matthew Mosher).

Other projects are not necessarily ambulatory, but still attempt to use sound to place someone in a remote location. This webpage by Michael Krzyzaniak has spherical panoramic images paired with ambisonic recordings, so as a user pans the image, the direction of sounds stays fixed to the correct virtual location. Perhaps this is more related to the Digital sound-objects placed in real 3d space category. This also feels more like 'virtual' than 'augmented' reality, even though the remote locations (national parks) are a real places, they have been recorded and made abstract. Perhaps if the video and audio were a live stream from a camera and microphone this would be more 'augmented', but perhaps not, because the user's senses are being completely replaced.


Figure : Stationary sound walk by Michael Krzyzaniak

Telmatic Musical Instruments

Whereas sound walks typically overlay one remote sound environment on one other local one, telematic musical instruments tend to involve many locations somehow overlaid. In the Telematic Drum Circle, many pneumatically controlled drums are in an otherwise empty room, and anyone can play one of the drums remotely by going to a website and tapping on their computer keyboard. Many people can do this simultaneously. In this scenario, one sonic environment is being broadcast out to many people and overlaid on the activity of 'drumming' on a computer keyboard.


Figure : Telematic Drum Circle, in which one sonic location is transported to many remote people and overlaid on the activity of 'drumming' on a computer keyboard.

In other scenarios, many people might be playing real musical instruments in separate locations, and the sounds are blended together and broadcast to everybody. Normally latency would be an issue. I did see one paper (that I can no longer find) where they were doing this with percussion instruments, and they were using computer vision and / or accelerometers to anticipate the strokes thereby overcoming latency. Eric Whitacre's virtual youtube choirs overcome the latency by simply not operating in real time. Many people just separately sing one part of a choir piece and later they are stitched together into a composite whole.


Figure : Eric Whitacre Virtual Choir

In this case, I think this is more 'virtual' than 'augmented', in the sense that each singer just sings into a microphone by themselves and that is the end of their involvement. But, it is easy to imagine a low-latency future where this can be done in realtime and each singer can hear all other singers while singing. I think that would be a pretty clear case of telematic AAR.

Comments

Popular posts from this blog

Ambisonic Rendering in the Story Bubble

How I calibrated my contact microphone

WaveRNN