In mic'ing a stage musical, one of the most basic choices is between reinforcement and amplification. The former is the more challenging, especially as spaces get larger. "Oklahoma," for instance, is an old-school musical and aesthetically demands the illusion that you are listening to the people on stage as their voice carries out across the open plains.
To achieve this effect requires clever speaker arrangements, careful adjustment of delay lines, and of course gentle amplification.
The opposite case is a show like "The Wiz," or "Grease"; pop songs in vocal techniques characteristic of amplified voices (as well as musical instrumentation with a sound defined by the electronics; keyboards, electric guitars, etc., as opposed to the strictly acoustic strings and woodwinds of the classic Broadway pit).
At my current theater our overall aesthetic (imposed at least in part by the quality of the voices we have available, our budgets, and the number of [noisy!] young people in our audiences) is to go in a direction of amplification for most shows. "Sound of Music" is the only show in my recent memory that really strove to create the illusion of un-amplified voice and band -- an illusion that the very young voices of our von Trapp children could not sustain.
At a previous theater, our, well, terrible equipment meant the amplified voice didn't match the natural voice. This made for a sort of uncanny valley you navigated with great care; below a certain volume, the reinforcement backed up the voice, but above that volume it took over and changed the character in an obvious (and often unpleasant) way.
My systems now are a much closer match, allowing me choice of where to go on that line; from imperceptible amplification to the point at which the natural voice is completely masked.
What makes the difference between reinforcement and amplification? In my mind, two elements dominate; volume, and delay. Following more distantly is acoustic placement (whether the vocals appear to come from a sonic space related to the stage, or whether they appear to come from speakers or another clearly defined point) and processing; the more pop material is usually treated with harder compression and a lot more reverb.
But here's where we get into system design. In both, the dominant concern is managing the wavefront. What hits audience ears is a combination of direct sound from actors and musicians, direct sound reflected off scenery and room, amplified sound from various speakers, amplified sound from foldback monitors (aka backline leakage), and amplified sound reflected by the room.
It is of course impossible (unless you are Meyer) to control all of these sources so they line up in the desired temporal order at all seats in the audience. But it is largely by juggling the temporal order that you attempt to manage at least some aesthetic goal out of what is otherwise sonic chaos.
My "flat field" approach for amplified music is actually quite simple. The speakers are time-aligned so energy from them all reaches the audience at roughly the same moment (obviously this relationship is different for each seat, meaning a lot of variance in seats across more than one zone of coverage). There is the smallest possible overall system delay and the system is run hot; overpowering any direct acoustic contribution.
This means the audience member is basically presented with a mix coming from a speaker. Any direct vocal energy from the stage, or backline leakage, arrives later and at a lower level and is psycho-acoustically folded in to the existing room reflections. In simpler language; the actual voice from the actual stage is perceived as part of the reverb tail of the processed sound.
The obvious drawback is that this requires powering over all other sources. If the drummer is loud (which they often are for pop shows!) I have to power over them; the original sound of the drummer has to reach audience ears as a distant, muffled echo of the heavily processed, close-mic'd sounds of the kit.
This means skirting local noise ordinances, frightening matinee audiences full of little children, and fighting aural fatigue. It also means that the equipment becomes super-critical; lose a wireless mic and you lose that singer. And a musician who is careless with the position of their mic can spoil the entire mix.
To contrast, the reinforcement technique primarily relies on an overall system delay. In this case, you are aiming for the magic corridor of about 10 milliseconds. Acoustic energy that fall within this corridor is perceived as originating spatially from the first source the listener hears. As long as the wavefront from the sound system falls slightly behind the direct acoustic sound of the actor, and the level is moderate (in a perfect world, you can get up to 10 db OVER the original sound without it being perceptible at all!) an illusion is maintained.
The more tricky element is that each source falls off in a different taper; the vocal energy of the singer is semi-directional, the sound of the pit orchestra is nearly omnidirectional and falls off roughly inverse-square, and although each single speaker is even more directional than the voice, the combination of all speakers in the tuned system can "fall off" in any arbitrary manner chosen.
So the tough task is to not just create the correct delay and an appropriate volume, but to maintain this relationship of delay and relative volume across a complex acoustic space. And this is a sensitive balance; if the drummer plays out, or the dancers demand more piano in their monitors, or a singer is tired and marking, the entire relationship can fall apart.
And because even something as basic and simple as The Precedence Effect (this psychoacoustic principle that can cause a reinforcement source to become literally imperceptible to all but trained ears) is impossible to explain to most people, the poor sound designer finds themselves completely helpless as those elements above (drummer, piano, marking singers, etc.) are arbitrarily changed by people who can't accept that they just wrecked the sound of the entire show.
Which brings me around to a new approach. It is an approach of basically brute force and ignorance, driven by bogosity, and will only work for some material. I call it the single-point solution, and it is what I am using on "Poppins."
The system is made possible by the fact that the band is in the pit, roughly along the same proscenium line as the main house speakers and the primary set of foldback speakers. So I've started by not even trying to control the band. Then pushing even more band at the stage with the most insanely loud foldback I've ever used. However, because of the physical alignment, this is more-or-less perceived by audience as just more pit leakage.
The mains are run hot with vocals, with system delay aligned to that same proscenium edge. This means that, roughly, all the various sources congeal in time at the proscenium edge. Since most of the amplification of the band is from foldback, it falls off as the natural pit does; in almost pure inverse-square. And since the vocals are shoved into just the mains (and the down-facing front fill) and are backed by stage energy (and vocal leakage from the insane levels of the band's own monitors; yes; vocals are feeding back from the pit before they feed back from the mains) they also come close to the inverse-square -- particularly when an entire ensemble is singing.
Or, rather, the colder I run the vocals, the closer the match will be to the taper from the band and stage noise. The lightly-amplified ensemble fares best; exposed soloists, less well.
With the delay speakers turned off, the wavefront that hits the majority of the audience is pretty much characterized as a large speaker the width of the proscenium, plus reverberation from the room. And at FOH in the back of the room, the reinforced sound and the direct sound (both desired sound like singing and un-desired sounds like stage noise) are in roughly the same volume relationship as that which the majority of the audience hears.
Where this falls down is obvious; the phase relationships are completely shot, and the musical material is smeared across time. Intelligibility suffers, fast arpeggios become sheer mush. The band suffers most here, as the majority of what is heard by audience has bounced off the set walls and returned to them in a nicely time-smeared multi-path.
The energy content across frequencies is even worse; most of the sources are indirect and highs are of course absorbed more readily, meaning the sound is heavy and muddy. And I can't make it up with the usual trick of putting more highs in the mains, because that would pull them out of the unified front.
The place where it goes most bad is the uncanny valley is back; if I push amplification too hot to try to power over, say, the tap dancing in "Step in Time," the reverberant, time-smeared, highs-attenuated material is pushed aside by direct, dry, full-frequency sound.
From close-mic'd instruments, of course; with the ensemble tap dancing directly over the pit, the only way to get the clarity in the foldback the ensemble requires is to stick the band microphones practically into the bells of the horns. Which means the amplified component of the total mix doesn't match at all what is leaking from the pit, and the band audibly changes sonic character when I have to shove the faders up to cover a scene change or blast my way over the top of tap dancers.
Probably the only thing that saves the mix at this point is the band themselves are varying wildly in their tone from number to number; mezzo from the brass and brushes from the drums, then suddenly a ear-tearingly crisp side-stick and a straining forte passage from the brass. So it isn't quite as obvious that their sound also alters dramatically every time the set moves, or a dance starts....