On Recording the Human Voice

Recording the human speaking voice can be one of the trickiest tasks a professional sound recordist encounters. Even when working with seasoned professional voice artists, problems can creep in. Here are a few of them and how to solve the problem.

Let’s begin by clarifying one thing. I don’t mean a singing vocalist, but a speaking voice — perhaps an announcer, a narrator, a person doing a commercial or even someone recording a book on tape. We are talking everyday speech.

First of all, the voice must sound natural and be clear and understandable. This is not music. Special efforts to manipulate the voice are not allowed. No masking is allowed either. We are talking about the purity of the human voice here.

Courtesy Alt Recording Studios.

Courtesy Alt Recording Studios.

Normally, in such situations, the voice talent sits or stands in front of a microphone in a treated studio or voice-over booth. It can be at a broadcast station, a recording studio or even on-location with the proper acoustic treatment. The problems we must deal with here occur even under ideal recording conditions.

The first situation that can occur is sibilance, a manner of articulation of fricative and affricate consonants. Sibilance occurs when a stream of air is directed with the tongue toward the sharp edge of the teeth. It causes a sibilant — or strident — sound.

Sibilance is an unpleasant tonal harshness that can happen during consonant syllables (like S, T and Z), caused by disproportionate audio dynamics in upper midrange frequencies. Sibilance is often centered between 5kHz to 8kHz, but can occur well above that frequency range.

This problem is usually caused by the actual vocal formant, but can also be exaggerated by microphone placement and technique. Every human voice is different and don’t pre-suppose that anything you’ve tried before will or will not work again. It’s all up for grabs.

The best way to start is to leave some space — about 12 to 18 inches — between the speaker and the microphone. Forget a pop filter here — it won’t help. Once you find a suitable microphone and distance combination that reduces sibilance, point the microphone downward 10 to 15 degrees toward the throat instead of the source. Also, a good tip is change out the type of microphone. Dynamic mics often work when condensers don’t in these situations.

If electronics are required, de-essers are the tools of choice. The de-esser technique typically uses a narrow peak EQ in the sidechain to boost the most offensive sibilant frequencies. This EQ exaggerates the dynamic difference between the sibilant band and the rest of the vocal waveform, making it much easier to achieve gain reduction during those consonants.

Another vocal issue that can develop into a problem are plosives — blasts of air that result from certain consonant sounds usually heard on words with Ps and Bs. This is where a pop shield does help. Position it a couple of inches from the mic and cross your fingers.

Plosives can be especially bad with cardioid or hypercardioid mics and can cause the diaphragm to bottom out, hit the backplate insulator and cause mechanical clipping. This is bad and can ruin a recording. In this case, try a mic with an omni directional pickup pattern which can lessen the effect. Sometimes, though, plosives are unavoidable.

If electronics are needed to fix plosives, try iZotope’s De-Plosive module in RX Advanced for the fix. As with all such problems though, it is best to solve it in the recording session rather than depend on electronic solutions in post.

Finally, and this tends to come into play when using other than top-tier trained voiceover artists, are the assorted pops, clicks, smacks, swallows and other odd sounds that creep into human speech. It can happen at any moment and often tests the skill set of the engineer doing the recording session.

These odd-ball sounds fall under the idiosyncrasies of human speech. This involves more the talent and professionalism of the person doing the recording more than anyone else. It may involve working with the voice talent to address the problem and to make sure the person is well hydrated before the recording session. It is always good to have hot tea, lemon and honey on the set to help soothe the voice.

Of course, switching mics and other gear can help, but in the end iZotope’s RX Advanced and Wave’s modules can also help save the day. Editing with these tools has become the go-to fix for many tiny, indescribable problems.

Recording the human voice has never been easy. It tests the skills of every recordist. When you think you’ve seen it all, there is something new waiting in the wings to test you again.

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…

Designing An LED Wall Display For Virtual Production - Part 2

We conclude our discussion of how the LED wall is far more than just a backdrop for the actors on a virtual production stage - it must be calibrated to work in harmony with camera, tracking and lighting systems in…

Microphones: Part 2 - Design Principles

Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.