The 50ms Loop: A Broadcast Engineer’s Confusion with AV Standard Practice

By: Anthony Kuzub. A Confused Broadcast Engineer

I’ve spent the last twenty years in OB vans and broadcast control rooms. In my world, “latency” is a dirty word. (production offsets is what I like to call them) If a signal is one frame out of sync, we panic. If a host hears their own voice in their IFB (earpiece) with even a 10ms delay, they stop talking and start shouting at us. You can judge an in ear mix by how far they throw the beltpack.

So can you imagine my confusion when I recently stepped into the commercial AV world to help commission a high-end boardroom.

I opened a clients DSP file—a standard configuration for a room with ceiling mics and local voice lift—and I saw something that made me think I was reading the schematic wrong.


There were Acoustic Echo Cancellation (AEC) blocks on everything.

Not just on the lines going to the Zoom call (where they belong), but on the microphones being routed to the local in-room speakers. I turned to the lead AV integrator and asked a genuine question:
“Why are we running the podium mic through an echo canceller just to send it to the ceiling speakers five feet away?”
His answer was, “To stop the echo.”
And that is when my brain broke.

The Chicken, The Egg, and The Latency

In broadcast, we follow a simple rule: Signal flow follows physics.

If I am standing in an empty room and I speak, there is no electronic echo. If I turn on a microphone and send it to a speaker with zero processing, there is still no echo—there might be feedback (squealing) if I push the gain too high, but there is no distinct, slap-back echo.

So, I looked closer at the AEC block in the software.

• Processing Time: ~20ms to 50ms (depending on the buffer and tail length).
Suddenly, the math hit me. By inserting this “safety” tool into the chain, we were effectively delaying the audio by nearly two video frames before it even hit the amplifier.

Here is the loop I saw:

1. The presenter speaks.
2. The DSP holds that audio for 50ms to “process” it.
3. The audio comes out of the ceiling speakers 50ms late.
4. The microphones at the back of the room hear that delayed sound.
5. Because 50ms is well beyond the Haas Effect integration zone, the system (and the
human ear) perceives this as a distinct second arrival. A slap-back. An echo.

Creating the Problem to Fix the Problem

I realized that in this room design, the AEC wasn’t curing the echo; it was the source of it.
Because the system was generating a delayed acoustic signal, the other microphones in the room were picking up that delay. The integrator’s solution? “Oh, just put AEC on those back mics too.”
It felt like watching a doctor break a patient’s leg just so they could bill them for a cast.
In the broadcast world, we use “Mix-Minus” (or N-1). If a signal doesn’t need to go to a destination, you don’t send it. If a signal doesn’t need processing, you bypass it. You strip the signal path down to the copper.

The “Empty Room” Test

I proposed a crazy idea to the team. I asked them to imagine the room completely empty. No Zoom call. No Microsoft Teams. Just a guy standing at a podium speaking to people in chairs.
• Is there a remote caller? No.
• Is there a far-end reference signal? No.
• Is there a need to cancel anything? No

If we simply bypassed the AEC block for the local reinforcement, the latency dropped from 50ms down to about 2ms. At 2ms, the sound from the speakers arrives at the listener’s ear almost simultaneously with the actual voice of the presenter. The “echo” vanishes.

The system became stable not because we added more processing, but because we stopped fighting physics.

A Plea from the Control Room
I’m still learning the ropes of AV, and I know that VTC calls are complex. But I can’t help but feel that we are over-engineering our way into failure.
If you have to use an Echo Canceller to remove an echo that you created by using an Echo Canceller… maybe it’s time to just turn the Echo Canceller off.