April 27 marked the 80th anniversary of a
historic milestone in the history of audio. On this date in 1933, the
Philadelphia Orchestra under deputy conductor Alexander Smallens was
picked up by three microphones at the Academy of Music in
Philadelphia—left, center, and right of the orchestra stage—and the
audio transmitted over wire lines to Constitution Hall in Washington,
where it was replayed over three loudspeakers placed in similar
positions to an audience of invited guests. Music director Leopold
Stokowski manipulated the audio controls at the receiving end in
Washington.
This historic event was reported and
analyzed by audio pioneers Harvey Fletcher, J.C. Steinberg and W.B.
Snow, E.C. Wente and A.L. Thuras, and others, in a collection of six
papers published in January 1934 as the
Symposium on Auditory Perspective by the IEEE, in
Electrical Engineering. Paul Klipsch referred to the Symposium as "one of the most important papers in the field of audio."
April 27, 1933: Leopold Stokowski at the controls with Harvey Fletcher observing
Prior to 1933, Fletcher had been working on
what has since been termed the wall of sound. “Theoretically, there
should be an infinite number of such ideal sets of microphones and sound
projectors [i.e., loudspeakers] and each one should be infinitesimally
small,” he wrote.
Fletcher’s dual curtains of microphones and loudspeakers
Fletcher continued, “Practically, however,
when the audience is at a considerable distance from the orchestra, as
usually is the case, only a few of these sets are needed to give good
auditory perspective; that is, to give depth and a sense of
extensiveness to the source of the music.”
In this regard, Floyd Toole’s
conclusions—following a career spent researching loudspeakers and
listening rooms—are especially noteworthy. In his 2008 magnum opus,
Sound Reproduction: Loudspeakers and Rooms,
Toole noted that the “feeling of space”—apparent source width plus
listener envelopment—which turns up in the research as the largest
single factor in listener perceptions of “naturalness” and
“pleasantness,” two general measures of quality, is increased by the use
of surround loudspeakers in typical listening rooms and home theatres.
Given that these smaller spaces cannot be
compared in either size or purpose to concert halls where sound is
originally produced, Toole noted that in the 1933 experiment, “there was
no need to capture ambient sounds, as the playback hall had its own
reverberation."
Localization Errors
Recognizing that systems of as few as two
and three channels were “far less ideal arrangements,” Steinberg and
Snow observed that, nevertheless, “the 3-channel system was found to
have an important advantage over the 2-channel system in that the shift
of the virtual position for side observing positions was smaller."
In other words, for listeners away from the
sweet spot along the hall’s center axis, localization errors due to
shifts in the phantom images between loudspeakers were smaller in the
case of a Left-Center-Right system compared with a Left-Right system.
Significantly, Fletcher did not include
localization along with “depth and a sense of extensiveness” among the
characteristics of "good auditory perspective.”
Regarding localization, Steinberg and Snow realized that
“point-for-point correlation between pick-up stage and virtual stage
positions is not obtained for 2-and 3-channel systems.” Further, they
concluded that the listener “is not particularly critical of the exact
apparent positions of the sounds so long as he receives a spatial
impression. Consequently 2-channel reproduction of orchestral music
gives good satisfaction, and the difference between it and 3-channel
reproduction for music probably is less than for speech reproduction or
the reproduction of sounds from moving sources.”
The 1933 experiment was intended to
investigate “new possibilities for the reproduction and transmission of
music,” in Fletcher’s words. Many, if not most, of the developments in
multichannel sound have been motivated and financed by the film industry
in the wake of Hollywood's massive financial investment in the
"talkies" that single-handedly sounded the death knell of Vaudeville,
and led to the conversion of a great many theatres into cinemas.
Given that the growth of the audio industry
stemmed from research and development into the reproduction and
transmission of sound for the burgeoning telephone, film, radio,
television, and recorded music industries, it is curious that the term
“theatre” continued (and still continues to this day) to be applied to
the buildings and facilities of both cinemas and theatres. This reflects
the confusion not only in their architecture, on which the noted
theatre consultant Richard Pilbrow commented in his wonderful 2011
memoir
A Theatre Project, but also in the development of their respective audio systems.
Theatre is Not Cinema: The Differing Requirements of Speech Reinforcement
Sound reinforcement was an early offshoot,
eagerly adopted by demagogues and traveling salesmen alike to bend
crowds to their way of thinking; yet, as Don Davis noted in 2013 in
Sound System Engineering,
“Even today, the most difficult systems to design, build, and operate
are those used in the reinforcement of live speech. Systems that are
notoriously poor at speech reinforcement often pass reinforcing music
with flying colors. Mega churches find that the music reproduction and
reinforcement systems are often best separated into two systems.”
The difference lies partly in the relatively low channel count of
audio reproduction systems that makes localization of talkers next to
impossible. Since delayed loudspeakers were widely introduced into the
live sound industry in the 1970’s, they have been used almost
exclusively to reinforce the main house sound system, not the performers
themselves. This undoubtedly arose from the sheer magnitude of the
sound pressure levels involved in the stadium rock concerts and outdoor
festivals of the era.
However, in the case of, say, an opera singer, the depth, sense of
extensiveness, and spatial impression that lent appeal to the reproduced
sound of the symphony orchestra back in 1933, likely won’t prove
satisfying in the absence of the ability to localize the sound image of
the singer’s voice accurately. Perhaps this is one reason why
“amplification” has become such a dirty word among opera aficionados.
In the 1980s, however, the English theatre sound designer Rick Clarke
and others began to explore techniques of making sound appear to
emanate from the lips of performers rather than from loudspeaker boxes.
They were among a handful of pioneers who used the psychoacoustics of
delay and the Haas effect “to pull the sound image into the heart of the
action,” as sound designer David Collison recounted in his 2008 volume,
The Sound of Theatre.
Out Board Electronics
in the UK has since taken up the cause of speech sound reinforcement,
with a unique delay-based input-output matrix in its TiMax2 Soundhub
that enables each performer’s radio mic to be fed to dozens of
loudspeakers—if necessary—arrayed throughout the house, with unique
levels and delays to each loudspeaker such that more than 90 per cent of
the audience is able to localize the voice back to the performer via
Haas effect-based perceptual precedence, no matter where they are
seated. Out Board refers to this approach as source-oriented
reinforcement (SOR).
The delay matrix approach to SOR originated in the former DDR (East
Germany), where in the 1970s, Gerhard Steinke, Peter Fels and Wolfgang
Ahnert introduced the concept of Delta-Stereophony in an attempt to
increase loudness in large auditoriums without compromising directional
cues emanating from the stage. In the 1980s, Delta-Stereophony was
licensed to AKG and embodied in the DSP 610 processor. While it offered
only six inputs and 10 outputs, it came at the price of a small house.
Out Board started working on the concept in the early 1990s and
released TiMax (now known as TiMax Classic) around the middle of the
decade, progressively developing and enlarging the system up to the 64 x
64 input-output matrix, with 4,096 cross points, that characterizes the
current generation, TiMax2.
The TiMax Tracker, an ingenious radar-based location system, locates
performers to within six inches in any direction, so that the system can
interpolate softly between pre-established location image definitions
in the Soundhub for up to 24 performers simultaneously. The audience is
thereby enabled to localize performers’ voices accurately as they move
around the stage, or up and down on risers, thus addressing the
deficiency of conventional systems regarding the localization of both
speech and moving sound sources.
Source-Oriented Reinforcement
Out Board director Dave Haydon put it this way: “First thing to know
about source-oriented reinforcement is that it’s not panning. Audio
localization created using SOR makes the amplified sound actually appear
to come from where the performers are on stage. With panning, the sound
usually appears to come from the speakers, but biased to relate roughly
to a performer’s position on stage. Most of us are also aware that
level panning only really works for people sitting near the center line
of the audience. In general, anybody sitting much off this center line
will mostly perceive the sound to come from whichever stereo speaker
channel they’re nearest to.
“This happens because our ear-brain combo localizes to the sound we
hear first, not necessarily the loudest. We are all programmed to do
this as part of our primitive survival mechanisms, and we all do it
within similar parameters. We will localize even to a 1 ms early
arrival, all the way up to about 25 ms, then our brain stops integrating
the two arrivals and separates them out into an echo. Between 1 ms and
about 10 ms arrival time differences, there will be varying coloration
caused by phasing artifacts.
“This localization effect, called precedence or Haas Effect after the
scientist who discovered it, works within a 6-8 dB level window. This
means the first arrival can be up to 6-8 dB quieter than the second
arrival and we’ll still localize to it. This is handy as it means we can
actively apply this localization effect and at the same time achieve
useful amplification.
“If we don’t control these different arrivals they will control us.
All the various natural delay offsets between the loudspeakers,
performers and the different seat positions cause widely different
panoramic perceptions across the audience. You only to have to move 13
inches to create a differential delay of 1 ms, causing significant image
shift. Pan pots just controlling level can't fix this for more than a
few audience members near the center. You need to manage delays, and
ideally control them differentially between every mic and every speaker,
which requires a delay-matrix and a little cunning, coupled with a
fairly simple understanding of the relevant physics and biology,” Haydon
said.
Into the Mainstream
More and more theatres are adopting this approach, including New
York’s City Center and the UK’s Royal Shakespeare Company. A number of
Raymond Gubbay productions of opera-in-the-round at the notoriously
difficult Royal Albert Hall—including
Aida, Tosca, The King and I, La Bohème and
Madam Butterfly—as well as
Carmen at the O2 Arena, have benefited from source oriented reinforcement, as have recent productions of
Les Miserables, Jesus Christ Superstar, Into the Woods, Beggar’s Opera, Marie Antoinette, Andromache, Tanz de Vampire, Lord of the Flies, Fela!, and many others at venues around the world.
Veteran West End sound designer Gareth Fry employed the technique earlier this year at the Barbican Theatre for
The Master and Margarita,
to make it possible for all audience members to continuously localize
to the actors’ voices as they moved around the Barbican’s very wide
stage. He noted that, in the three-hour show with a number of parallel
story threads, this helped greatly with intelligibility to ensure the
audience’s total immersion in the show’s complex plot lines.
Based on the experience, Fry said, “I’m quite sure that in the coming
years, SOR will be the most common way to do vocal reinforcement in
drama.”
As we mark the 80th anniversary of that historic first live stereo
transmission, it’s worth noting that, in spite of the proliferation of
surround formats for sound reproduction that has to date culminated in
the cinematic marvel of 64-channel Dolby Atmos, we are only now getting
onto the right track with regard to speech reinforcement.
It’s about time.
(photo source: http://www.stokowski.org)