Tuesday, January 4, 2011

The Two Nations of Sound Reinforcement

I got called last month for an emergency come-in-and-FOH-our-show-the-last-sound-guy-left-suddenly. The design I was working off of was unique in several ways that I'm going to be thinking on for a while. But it did propel me to expand on my remarks earlier about there being two schools of reinforcement design.



I've seen this happen more than once: a sound person -- with lots of experience, credits in working good venues, etc. -- comes in to hang wireless mics on the cast. The first thing they do is ring out the room. The next thing they do is bring each and every member of the cast center stage, to stand there silently while the designer rings out their individual wireless mic. When every possible ounce of gain before feedback has been achieved, they crank up the compression to 4:1, and open up the mics of every person on stage in each scene, holding them all just below feedback threshold.

This means earsplitting levels and a crunchy, distorted sound, with that distinctive tinny ring that comes from mics driven right below the feedback threshold. This means dialog is too loud, singing may still be too soft, ensembles never blend, and the few cast members who either didn't get a mic or had a mic go dead on them are completely inaudible. As the show continues, hearing fatigue sets in and the sound seems to get softer and softer and softer just as the show should be getting bigger and bigger.

You can probably guess I don't belong to that school.

In my opinion, the sole advantage of that method, is that it (temporarily) shuts up the people who want you to "turn it up" because they can't hear something. It also satisfies those occasional Upper Management visitors who give you a quiet but firm; "The sponsors are in the audience tonight, and they'd like to hear how good that $60,000 dollar sound system they bought for us works. So you could show it off a little, hmmm?" (By which they mean -- loud is good, so make it loud.)

Of course, since you opened the show with cranking levels, you've already shot your wad. There is no place to go if a louder song comes along, or a singer gets a dry throat. Or, as the evening wears on, what had been loud begins in context to feel quiet.



Hearing fatigue is a very real thing. High frequencies fatigue faster; meaning as exposure to high-decibel sound goes on, the sound begins to sound increasingly heavy and dull and needs to have high frequencies boosted to sound normal again. Which is why experienced mixers keep levels moderate and stop every few hours to recover their ears, less they create mixes which are tinny and obnoxious.

There are two illusions going on here; two very familiar illusions that crop up all the time in my work, and are often at the base of the anti-conspiracy theory writings I do. The first is that human senses are calibrated. The second is that sensory information can be collapsed into a single quantity. But they both can be brought under the header of "the senses don't lie." Which, in fact, they do...constantly and creatively!

Let's unpack the former illusion first; the one that senses are calibrated (i.e. "Loud" always seems "loud," and "red" always seems "red.") Everyone should have had the experience at some point in their lives of wearing colored sunglasses. A modern version I've been enjoying is using a red LED to see, and even a time or two to read in bed! After forty five minutes or so, the automatic color-correction of the human eye has made that red light more of a dark amber (it can't QUITE compensate enough to perceive it as fully white). But when I switched it off and turned on the room lights, the white walls and fixtures were, for a few minutes, the most lovely blue-green.

The human eye automatically and constantly adjusts its white point, and so subtly you are rarely aware of it. You "see" a sheet of white paper as being white out in the noontime sun, and at night under an incandescent desk lamp, even though the color temperature of those two sources are some two thousand degrees Kelvin apart. Indeed, to make that sheet of paper appear "yellow" in your perception you have to drop down to the color temperature of a candle -- an additional thousand Kelvin below sunlight.

Film photographers may remember "tungsten" film versus "daylight" film, which was the retail version of color balancing film stock towards the lighting environment. Shoot indoors on daylight film, and the film would faithfully record a yellow-orange world. But since it wasn't the world our color-adapted eyes had seen, we cheat the film to record something closer to what we perceived. (In the cinema world the choices, and the filters, were much more complex).

(This, by the way, is but ONE of the reasons why it is so insanely difficult to answer "What color is the Martian sky?" The scientifically illiterate have this illusion that color is color, and NASA can either print the right color on a photograph, or print the wrong color for some nefarious purpose of their own.)

But back to the subject. The accompanying illusion is that our perceptual tools are linear. That is, that twice the power into an amp will deliver a sound that feels twice as loud, and, that two sounds of equal power-into-amp will be perceived as equally loud. This is also not true. Our perception of loudness is moderated in part by a hard-wired expectation in our ears of a certain balance of sounds. A similar rule is what makes a super-bright "White" LED look white. It is actually nothing of the kind. But the LED provides peaks that, in any ordinary light would be part of a white-light spectrum. Our brain then interprets this as being a black-body curve of some characteristic color temperature.

(A very peculiar related illusion has to do with clock chimes of the old rectangular-bar style. You see, any musical tone is made up of a fundamental frequency, and the even and odd harmonics of that frequency. The characteristic timbre of a sound comes from the mix of these harmonics. Well, in the case of the clock chimes, the vibrational modes of the rectangular chime are perceived by the ear as being, not fundamental tones, but as harmonics of a fundamental. The brain fills in this "missing" fundamental, and what you perceive is a lower pitch than would be possible for a six-inch long chunk of brass.)

The non-linear response across frequency in the ear is what gave rise to the old "loudness" knobs on consumer stereos; this knob would selectively increase frequencies in the bass and treble range that were characteristic of the profile of a louder sound. The brain, fooled, would perceive the sound as "louder" even though the power requirements of the amp had only changed marginally.

(And, yes, I'm over-simplifying here to make a point).

These illusions are the bane of my existence as lighting and sound designer. I have, through experience and training and knowledge, the ability to compensate intellectually for the perceptual effects. My clients not only lack these tools, they are unaware of the need of these tools.

I have had directors come in at noon and complain that I changed the lights -- because, "Last night it looked great but today everything is all yellow." And I have had similar conversations about sound. Most people are simply incapable of understanding when they don't understand; as far as they are concerned, the light is yellow, or the sound is loud, and it is so obvious it can't possibly wrong (and any explanations are simply obfuscation).



To add one more complexity to the mix, the primary goal in sound reinforcement, especially for vocals (stage singing and acting), is intelligibility. Intelligibility is not volume! This doesn't work for the stereotypical American tourist shouting English words at the poor local, and it doesn't work when the audience is wrinkling their brows trying to figure out why a Plane in Maine rests lightly on the Brain.

Professionals have formula, and a specialist vocabulary for describing intelligibility and the factors that impact it. I'm going to try to go for a simpler, plain-language explanation here. To wit: intelligibility is dependent on;

1) Sufficient sound pressure in the frequencies that carry useful information (aka, vowel sounds, 500 to 1K, consonants, 2-4K, fricatives and sibulants contain energy in 6-8K. POTS -- the Plain Old Telephone System -- achieved voice intelligibility with a frequency range of 300 to 3400 Hz).

2) Lack of significant distortion in terms of time/phase smearing of the information in these frequencies.

3) Lack of competing material, both in the same frequency band, in the same time domain, or out of band and either generally too powerful and/or too busy, or of a nature (strong single pitches, particularly at neighboring frequencies) as to cause acoustic masking. (The latter, to simplify again; given a strong signal peak, the ear actually shuts down activity in the neighboring bands.)

One of the prime factors against intelligibility in a sound reinforcement situation is reverberation, and the similar effect of signal path delays. Simply put, both of these put the voice in competition with itself; with another complex signal sharing the same time and frequency domain.

(Actually, our brains are very good at summing information that arrives within a certain window of time. If a person is talking at you in an ordinary room, chances are a good half of the acoustic energy reaching your ears is not directly from their mouth, but has followed various reflection paths. Within 20 milliseconds or so, the brain can sum all of these reflections up to create for itself a stronger and better defined original; actually enhancing the reception of the sound. Outside of that window, the reflections begin to be treated as a second, competing sound instead.)

This is why amplification is not a panacea. As you amplify the voice, you also bring more of the reverberation around the room into perceptual range. Also, your sound system has various delays, acoustic and electronic, built into it. The end result is to smear the voice across time and the make it harder to pick out the important information at the same time you've boosted that information. As you increase volume further, you start to fall into a circle of death, where each increase only makes the voice harder to understand. (Among all the various effects are secondary vibrations brought out by the strong signal; from vibrating wall panels and fixtures, to tiny waves inside the inner ear itself, until it shuts down to protect itself. A sound that is loud enough becomes distorted inside the human ear itself.)

The "other" school of reinforcement design only exacerbates this problem by using severe EQ to remove those same frequencies you wanted in the first place! So instead of enhancing the 2KHz range that would lead to a more intelligible voice, you are boosting 200Hz rumble and 8Khz sizzle that act to mask the very frequencies you need to understand the words of the singing and dialog. The extreme levels, and the ringing of near-feedback, only ensure that aural fatigue will set in quickly, and the ears of the listeners become increasingly less able to pick out the voices you purportedly were reinforcing.

The saddest part is that client, directors, sponsors, audience members will all strain to make out words (and largely fail), but will never think of blaming the sound designer because the pain in their ears is all the evidence they need that the sound is nice and loud.



I think there is a reason I am getting asked back to a growing list of organizations. I've been walking into theaters, often when there is clear evidence of the "volume at any cost" school having been working there before, and I've made a sound that made people happy.

Not at first. Oh, the arguments at first, when people realize their ears aren't hurting, and decide that the sound man must be Doing Something Wrong. And when it seems so obvious that something needs to be turned up, and I put my back up and don't let them do it (finding instead some other way, generally through corrective EQ and the lowering of competing levels), to achieve what had been desired.

Yes, it's the simple answer to turn it up. If you can't hear the guitar, then "turn it up." Well, a smart mixer will first look to see why the guitar isn't speaking. Is it improperly EQ'd? Does it not sound like a guitar? Is the sound muddy and unfocused? Are those frequencies that give it its distinctive timbre being suppressed?

And if it is a good guitar sound but still not speaking, you look for ways not just to power through the rest of the band, but to sidestep it. Perhaps a little panning will bring it into a place on the soundstage where it will speak. Perhaps a little EQ will bring out frequencies for which there is a gap it can speak through.

Usually, you end up tweaking a couple of things. Maybe it's clashing with the piano. So pan the piano one way, the guitar the other way. Nudge the guitar at 400Hz (for that "woody" sound) and 12KHz (for the finger noise), and squish the piano so it is mostly speaking in the 1-4KHz mid-range.

Sorry for the seeming digression. The same kinds of things can also be done to let the clarity of the singer come through the acoustical environment.

To my mind, the first step is to make the singer sound good. And sound like themselves. Instead of a mic that is rung out to the ninth degree, start flat, roll off the unneeded bass (when I can, I brick-wall somewhere around 125 Hz. Many boards have a more sloping filter available, and the roll-off might start as high as 400 Hz to get the desired effect down in the 100's) and maybe a little off the top, where little but sibulants and feedback live. Run it though a basically flat system, at a gentle level, and tweak the mic just a little to bring out the individual character of the singer.

With that for a starting point, you don't have to run levels up to the sky in order to carry along the broken fragments of the intelligibility frequencies you wanted to boost. You don't have to ramp up the levels until you've saturated the house, reverberation time has stretched into the tens of seconds, and all the lighting instruments on the catwalk are rattling in sympathetic vibration.

The downside is you don't have quite as much sheer gain. You do get more bang for your buck, though; what gain you have gets you much more in intelligibility. And nothing prevents you from notching out a few of the worst offenders -- just so long as you don't go so far it totally changes the character of the voice.



And, well, this essay has gotten long enough. Perhaps later I'll go into more detail about the "other" other school; the one that believes in enhancing the natural sound of the person or instrument being mic'd, and controlling the entire sonic environment, rather than reaching for dramatic EQ as a way to win a volume war.

2 comments:

  1. Hey Mike, First off your blog is great and I wish I would have found it a few years ago.

    I have a question for you. Since you prescribe to the 2nd school you talked about in this article, do you still EQ the house before anything or do you let the individual mic's EQ take care of everything?

    ReplyDelete
    Replies
    1. That, and more.

      First off is to get a system that is flat and "tuneful." (As always with setting up the house EQ (and delays, etc.) you need both measurements and ears.)

      The flat house is your starting point for music playback, effects playback, and of course microphones.

      Then you make a few adjustments to the microphone bus; these are effects that will only apply to the wireless microphones. I often hang a graphic to gentle notch out the worst feedback and room tones, as well as add a slight 4-6KHz boost and a little more low end roll-off. Plus this bus often has a gentle compression on it.

      Often a show will have a particular flavor, and I'll set up an overall mic EQ that gently favors some parts of the frequency band. This is also where I make decisions about the amount of digital delay, and the "taper"; the relative vocal level between different speakers.

      These "mic bus" settings are tailored for the needs and desired sound of each specific show (although I'll often save and re-use them!)

      This means that when I get down to individual microphones, the only EQ and compression on them is what brings out the character of the individual singer (or attempts to cover their worst deficits).

      Delete