Hack 60. Pay Attention to Thrown Voices
Sounds from the same spatial location are harder to separate, but not if you use vision to fool your brain into "placing" one of the sounds somewhere else.
Sense information is mixed together in the brain and sorted by location [Hack #54], and we use this organization in choosing what to pay attention to (and therefore tune into). If you're listening to two different conversations simultaneously, it's pretty easy if they're taking place on either side of your headyou can voluntarily tune in to whichever one you want. But let's say those conversations were occurring in the same place, on the radio: it's suddenly much harder to make out just one.
Hang on...how do we decide on the spatial location of a sense like hearing? For sound alone, we use clues implicit in what we hear, but if we can see where the sound originates, this visual information dominates [Hack #53] .
Even if it's incorrect.
5.9.1. In Action
Jon Driver from University College London1 took advantage of our experience with syncing language sounds with lip movements to do a little hacking. He showed people a television screen showing a person talking, but instead of the speech coming from the television, it was played through a separate amplifier and combined with a distracting, and completely separate, voice speaking. The television screen was alternately right next to the amplifier or some distance away. The subject was asked to repeat the words corresponding to the talking head on the television.
If they watched the talking head on screen nearby the amplifier, they made more errors than if they watched the talking head on the screen kept distant from the sound. Even though both audio streams were heard from the single amplifier in the two cases, moving the video image considerably changed the listener's ability to tune into one voice.
This experiment is a prime candidate for trying at home. An easy way would be with a laptop hooked up to portable speakers and a radio. Have the laptop playing a video with lots of speech where you can see lip movements. A news broadcast, full of talking heads, is ideal. Now put the radio, tuned into a talk station, and the laptop speaker, in the same location. That's the single amplifier in Driver's experiment. The two different cases in the experiment correspond to your laptop being right next to the speakers or some feet away. You should find that you understand what the talking heads on the video are saying more easily when the laptop is further away. Give it a go.
5.9.2. How It Works
It's easier to understand what's going on here if we think about it as two separate setups. Let's call them "hard," for the case in which you're looking at the television right by the amplifier and "easy," when you're looking at the screen put a little further away.
In the hard case, there's a video of a talking head on the television screen and two different voices, all coming from the same location. The reason it's hard is because it's easier to tune out of one information stream and into another if they're in different locations (which is what [Hack #54] is all about). The fact there's a video of a talking head showing in this case isn't really important.
The easy setup has one audio stream tucked off to the side somewhere, while a talking head and its corresponding audio play on the television. It's plain to see that tuning into the audio on the television is a fairly simple taskI do it whenever I watch TV while ignoring the noise of people talking in the other room.
But hang on, you say. In Driver's experiment, the easy condition didn't correspond to having one audio stream neatly out of the way and the one you're listening to aligned with the television screen. Both audio streams were coming from the same place, from the amplifier, right?
Yes, right, but also no. Strictly speaking, both audio streams do still come from the same place, but remember that we're not very good at telling where sounds come from. We're so poor at it, we prefer to use what we see to figure out the origin of sounds instead [Hack #53] . When you look at the screen, the lip movements of the talking head are so synchronized with one of the audio streams that your brain convinces itself that the audio stream must be coming from the position of the screen too.
It's whether the video is in the same place as the amplifier that counts in this experiment. When the screen is in a different place from the amplifier, your brain makes a mistake and mislocates one of the audio streams, so the audio streams are divided and you can tune in one and out the other.
Never mind that the reason the conversations can be tuned into separately is because of a localization mistake; it still works. It doesn't matter that this localization was an illusionthe illusion could still be used by the brain to separate the information before processing it. All our impressions are a construction, so an objectively wrong construction can have as much validity in the brain as an objectively correct construction.
5.9.3. End Note