Every now and then, when I am mixing a show, I connect something to the board to take a recording of what is passing through it. Sometimes a performer, or a videographer, asks. Other times it is just for my own continuing education (to review off-line the sonic choices I'd made during the show).
But when I email an mp3 to a soloist or a band, it isn't as simple as pressing "record" during the show. There is a fair amount of work I have to do off-line, and some difficult artistic choices to make.
The first problem lies in the nature of the show itself. These are not recording sessions. We don't have the luxury of careful adjustment of microphones and player technique, or multiple takes. We have to deal with what happens in a live performance, with the mistakes, the crying babies in the audience -- and the compromises of getting set up on multiple performers with a small equipment budget and little time between acts.
The show is for the audience, not for recording. What is going into the board, and through the board, is optimized for playback on speakers.
Think of the following example; a trumpet and piano are playing. The trumpet is loud, and needs no help to be heard at the back of the hall. The piano is relatively soft, and needs amplification to be heard over the trumpet. This means, what the audience probably heard was more trumpet than piano. But what the sound board "heard" was much more piano than trumpet. If all you did was plug a recorder into the main outputs of the board, you would hear a very peculiar concert indeed.
Even more so, the bass and drums might not be on mic at all (the drums are loud enough on their own, and the bass has his own amplification.) You might chose to take a DI (a direct connection) from the bass, bypassing his amp, but that sounds by itself nothing like the sound that is coming from the bass cabinet during the show.
Depending on the complexity of the sound board, there may be options to split the signal in various ways, getting taps off the various microphones before the level and equalization that adjust what is being sent towards the audience. But, again, these are secondary; the primary need is to keep the clear signal to the audience (and to performer's monitors) and recording must subsist on handouts.
And then you have the nature of acoustic space. What it sounded like to a human sitting in that audience is NOT what it sounds like to even the best microphone placed in that chair. (The partial exception to this is binaural recordings played back through headphones, or certain area mic set-ups taken in a controlled studio environment).
A large part of the reason is focus. When you are sitting in the space yourself, you have visual and subtle audio clues that help you sort out the sound picture. You can see the band is over THERE. You can see that the person that stood up for a solo is holding a clarinet. You can see that the coughing is coming from a man near the aisle. This all helps you tune in on the sounds you want, and tune out the sounds you don't.
There is a miasma of subtler sounds, sounds we usually ignore; cloth rustling, low voices, feet shuffling, breathing. We can usually use our ability to focus within a real acoustic space to take these out of our perceptual picture. We are very good at that; at carrying on a conversation around an annoying noise or background music, at filtering out background noise. A microphone does not do that; it captures what is actually in that space.
(This is a particular problem for sampling the sounds of real artifacts for sound design. There are always other sounds intruding, sounds like distant traffic or air conditioner or passing plane, that you normally filter out of your mind. If they get on the recording, though, they often spoil it. The worst problem is when people without trained ears try to "help." They haven't learned how to unfocus their perceptual field -- they inevitably hand you back recordings filled with background noise they simply didn't perceive themselves.)
The ear has a couple of perceptual tricks to help it do this. One is the physical structure of the ears themselves. Between the stereo of a pair of ears, and the subtle equalization and phase distortions introduced by the pinnae (the external ear), and the skull, sinuses, and head itself, the human brain is able to sort out extremely subtle sonic cues about direction, distance, and qualities about the surrounding space.
When you walk into an environment, your ears take a quick measurement of what is natural and how the room reflects and absorbs. Your brain then compensates. Essentially, it notices that sounds are muffled in one room, and turns up the high end to compensate. Or it notices that there are a lot of reflections off the polished marble walls, and it decides to ignore them when localizing sounds.
The binaural recording technique was to stick a pair of microphones inside a dummy head, with plastic ears (molded from a real pair of ears). When done right the effects are spectacular; it reproduces so many of these subtle clues you can actually describe the shape and wall material of the room the recording was made in (for such information is indeed gathered by our ears).
One last trick ears have that even binaural mics don't; they move. The ear and brain can make lighting-fast comparisons between the different phase interactions that occur at sub-centimeter spacing (depending on the frequency of interest!) and thus compensate for much of the effect of room nodes and destructive reflection. A microphone, being immobile, is stuck with the comb filtering the natural acoustics imposes on its exact location.
Binaural recordings, no matter how clever, are far from optimal for the recording you made of yourself at Friday's performance. Your friend doesn't want to don earphones and close their eyes and go through the several minutes of adaptation to the captured aural environment. They want to load an mp3 on to their iPod, or open it right from the email, and listen to it on computer speakers or whatever they have handy.
Which means what you want to produce for these "off the board" recordings is not what the audience heard, either. What I aim to produce, in fact, is the emotional effect of being in that audience listening to that performance.
This is rather akin to motion picture sound. If you've had the chance to listen to the actual location audio (especially, that is, non-professional location audio, aka "what a microphone mounted on the camera heard" you know just how horrible it sounds compared to a movie soundtrack.
It isn't just that the movies can get their hands on $20,000 Neumanns. The essential difference again is one of focus.
Motion picture dialog, like book dialog, may look like human speech but it is compressed. They omit most of the hemms and haws and back-tracking and repetitions of real speech to create dialog that (mostly!) gets to the point and communicates plot-essential information quickly and accurately. Motion picture images have the same compression; it may be a baroque image full of movement and detail, but everything in it is carefully selected and subservient to the story-telling.
So, too, is sound focused. Instead of recording all the natural noise in the environment of the shot at hand, they take selected elements and push them, creating a caricature, a cartoon, a schematic of the real thing.
In the real world, walk across a room and open a door. Unless you are wearing heels and on hardwood or stone your footsteps are probably a muffled fumbly sound barely rising above the background noise, and the door lock makes a low indistinct sound that could be anything. Compare to a Hollywood movie, where the footsteps are clear and defined and the lock sings a little clearly metallic and mechanical song with a rising tone that says "I'm opening."
Which shades us subtly into not just the choices of clear story-telling, but the use of perspective in movies; how what you hear in a movie is guided strongly by how you are supposed to react emotionally, and by that strange game of "who are you being?" as you watch the film. To over-simplify, you hear a loud gun click because the gun is dangerous and important. You hear the clothes movement and footsteps of James Bond because you are BEING James Bond and you'd hear your own clothes move.
But back to recording!
My steps in re-creating the emotional effect of the performance are to, first; get the most isolated and direct sound I can (to remove as many distracting noises as possible), second; to tailor the sounds towards a specific effect, and third; to put back in an artificial and select approximation of the performance environment using such tools as computer-generated reverb and a select amount of sound captured from area mics and audience mics.
There is always a balance between honesty/accuracy and listen-ability. Consider the -- entirely hypothetical! -- case of a singer accompanied by an amateur guitar-playing friend. Who played badly. Honesty towards what the audience heard, and honesty towards the performer who might want to know what they actually sounded like in order to learn and improve, calls for leaving that poor guitar part in un-altered.
On the actual performance night, however, natural flaws of the space masked many of the flaws of the guitar. And the audience was emotionally favored towards the obviously struggling young guitarist, and was willing to forgive him. So to capture the performance impact and the emotional effect of that performance, you DO want to clean up the guitar a little.
And then, the song is the point, the vocal performer is the person who asked you for a recording, and that would seem to make the emphasis towards making them sound good even if that means hiding elements (like the guitar) that distract from her performance.
All of this, I have to balance. And one more thing, perhaps even more important than the rest.
A typical show may have twenty different acts. I take when I can multi-track recordings on to hard disk, with as many individual feeds as I can split off from the sound board plus as many additional spot and area mics I can hide in the performance space (quite literally...I once stuck a PCC under a choral riser to pick up a little of a drum).
So there are rather over one hundred individual sound snippets, with working files taking up 6-10 gigs of hard drive space, and almost two hours of actual songs to work through. And I'm not getting paid for this; it is strictly as a favor to my friends among the performers and for my own amusement.
So the last balancing act is between the kind of tweaking and tailoring that might improve the recording one more tiny bit, and getting the file done and moving on to the next song before midnight arrives. Even a rough mix of an entire night's worth of material takes a couple of days to do. Doing more does not make sense.