Solving Lip-Sync Problems in a
Hybrid Analog-Digital Television Plant
By Steven A. Smith
Vice President of Engineering/Technology
Greenville, South Carolina
In an era of large-screen home television displays, viewers are noticing an increasing number of lip sync errors in broadcast programming. These errors are rooted in a lack of synchronization between audio and video signals, and can easily become a serious problem - particularly in hybrid analog-digital television plants.
The audio-video sync issue has become so severe that the Society of Motion Picture and Television Engineers S22 Committee on Television Systems Technology has formed an Ad Hoc Group on Lip Sync Issues to review all aspects of this problem and make recommendations for solutions.
The problem is caused when delays are introduced as video and audio is processed at different points within a facility. For a live remote signal, for example, the error occurs when the video enters the facility and is processed through a frame synchronizer. This process locks the video to the local in-house timing reference and introduces a delay. At this moment the audio gets ahead of the video.
Delays can occur throughout a television facility, especially at points where analog signals are converted to digital. Additional delays?usually in single frame increments?are introduced once the signal reaches the production switcher and it's DVE's.
As digital video processing equipment has evolved from field processing and storage to full video frame storage, the problem has increased rapidly.
The CCD optical block used in field and studio cameras adds another field of video delay while the image is being captured, stored and processed in the camera.
Whenever one or more DVE processors are on-air, the associated video sources will be delayed, resulting in additional lip sync error. Branding devices that generate logos, information about current time/temperature, and crawls also contain DVE's and can contribute significantly to the problem. The lip sync problem becomes acute when various combinations of equipment in series are switched in and out of the signal processing path. Even if a plant seems to have no serious lip sync problem, changes in internal process and incoming programming or spots with some delay suddenly are unacceptable.
There are no industry standards for automatic synchronization of video and audio equipment. Very few manufactures have designed non-standardized interfaces for video processing equipment to provide the capability to synchronize or provide a synchronizing signal to audio analog or digital processing equipment.
Isolated, the error generated by any single device may or may not be noticeable. However, when cascaded with other lip sync errors that typically accumulate from the original acquisition point to the viewer, the end result can become significant, especially to HDTV viewers watching on large screen displays.
Liberty Corporation, owner of 15 network-affiliated television stations in the Midwestern and southern United States, has devised a strategy to dramatically reduce audio-video sync errors in our hybrid analog-digital plants. Like all TV Stations, our studio facilities are migrating from analog to SDI which means video and audio delays are constantly changing as new generations of digital equipment are installed. New digital equipment installed today without creating an apparent lip sync problem, may the real cause of a lip sync problems when some other equipment is replace elsewhere in the facility.
Since we cannot control the problem in all incoming video feeds, we have established an in-house goal to add no (zero) additional errors within our own station facilities.
Rather than try to "mop up" sync errors at the end of the chain in master control, our philosophy is to correct the timing at each step in the process from incoming frame synchronizer throughout the plant to the many devices on the output of master control. By addressing the problem at the point that it occurs, our engineers have been successful at managing lip sync problems until a more elegant solution emerges.
Though there may be other equally capable solutions available in the marketplace, we have successfully attacked in-plant sync errors with products from two manufacturers, Snell & Wilcox's IQ Modular infrastructure system and Pixel Instruments' DG-1200 Automatic Lip Sync Corrector for production switchers.
Snell & Wilcox
From the time several years ago that Liberty Corp. adopted Snell & Wilcox's IQ Modular components for its first DTV transition solution, the benefits of the manufacturer's RollCall infrastructure management system became very clear.
Using a standard Windows PC, dedicated control panels, or both, RollCall provides the communication for configuration, control, and monitoring of over 300 "smart" IQ modules. These modules, with a myriad of functions, can be mixed and matched to build any kind of television infrastructure.
Key to attacking the audio-sync problem in our plants is RollCall's delay tracking feature. Each analog-to-digital IQ video conversion module has a delay-tracking feature that can be used to control a corresponding audio module. This critical feature allows us to maintain lip sync at any juncture within the plant infrastructure.
Our first use of this feature was in the single-rack DTV Master Control solution designed for each of our stations. Audio and video sync is maintained during the video decode and upconversion process by the automated tracking delay. After a signal passes through the IQ Modular system, it is sent to the Snell & Wilcox HD5200 for upconversion and then passes to the HD master control switcher. The IQ Modular enclosures and HD5200s are networked with RollCall.
Our success with this technique led to an expanded use of IQ modules and RollCall to tame lip sync problems?both large and small?in other parts of the broadcast plant.
Back to frame synchronizers. Previously, when our engineers ran a video signal through a frame synchronizer it would be locked to the local time base. This could cause lip sync to be up to a frame off. However, RollCall automatically controls the amount of delay being imposed by the IQ Modular frame synchronizer and tells the corresponding IQ Modular audio A-to-D card how much to delay it. We even add an extra frame to compensate for the field camera CCD block delay.
From this point on, the audio card tracks the timing of the video card. The tracking delay is automatic and the lip sync issue is instantly resolved at this stage in the system.
This solution can be used for a variety of sync problems, including constant, small delays that can be corrected later in the chain. For example, a small delay caused by a routing switcher can be managed at the next correction stage using RollCall.
As Liberty's television stations continue to migrate to a greater penetration of digital technology, we will continue implementing the IQ Modular delay solution wherever we find a synchronization issue. This allows easy adjustment of any delay from RollCall's GUI.
Another source of sync problems is at the production switcher. Video delay through the switcher is usually predictable based upon the configuration for any given combination of digital effects.
This predictability, and the fact that all switchers have GPI and/or tally outputs, is the basis for the DG-1200 Automatic Lip Sync Corrector by Pixel Instruments. Using the GPI or tally signals, the DG-1200 generates the appropriate delay commands to steer one or more audio synchronizer to automatically eliminate lip sync errors.
The DG-1200 has twelve input channels, each consisting of a GPI start/stop pulse and a tally line. Each input channel also has a linked delay time register with a user selectable value from 20 ?sec (nominally zero delay) up to 6.5 seconds, in increments of 100 ?sec. Delay times can be entered and displayed in milliseconds or in TV fields (NTSC or PAL).
Any input channel and its time value can be routed to any of the five output timers and each timer can steer a separate Pixel Instruments' AD-3100 Audio Synchronizer. The output timers can have different time values and can be turned on and off independently. Any timer can be controlled by more than one input.
For example, say one switcher effect needs a one frame audio delay and another effect needs a two frame audio delay. Input #1 (or any other input) can enable a one frame delay in Timer #3 (or any other timer) and the associated AD-3100. Any other input can be used to enable a two frame delay in the same timer.
The most comprehensive solution is to add audio synchronizers ahead of the audio mixer. This configuration ensures that all sources contributing to the program output have the correct lip sync. In another configuration, a single audio synchronizer is added at the output of the audio mixer. The amount of delay added to the audio path is chosen as a compromise for the sources contributing to the program output in any given effect.
For example, in a typical newscast over the shoulder shot, the studio anchor has one frame of video delay (Camera CCD) and the remote reporter (in the box) has three frames of video delay (Camera CCD, Remote Frame Sync and Switcher DVE). This does not include the typical base line delay of the switcher that is the same for both.
Setting the AD-3100 delay to between one and three frames or two frames is the best compromise for both sources. The studio anchor's audio will be slightly late and the remote reporter's audio slightly early.
Splitting the difference and choosing two frames is generally not the best choice since the early audio of the remote reporter is more noticeable than the delayed audio of the studio anchor. However, the residual lip sync errors are significantly reduced compared to doing nothing at all.
Adding further variability, the video delay of the DVE may be switched in and out of the program path several times in a relatively short time. Current generation production switchers have the potential of 3-9 cascaded DVE's.
Therefore, it is essential that the audio delay track the rapid delay shifts of the video program and "catch up" quickly. The AD-3100 incorporates automatic pitch correction to allow rapid delay change without introducing undesirable artifacts such as pitch shifts, clicks and pops in the output.
Conventional audio synchronizers typically limit the rate of change of delay to around 0.5 percent. This means that for a one frame video delay change at the beginning of a program segment, the audio does not "catch up" until almost 10 seconds later. And another 10 second "catch up" period occurs at the end of the segment when the video delay reverts to normal.
The AD-3100 has an adjustable rate of delay change of up to 25 percent. So, in the example of a one frame change in the video delay, the AD-3100 will "catch up" in just a few frames - well before the viewer will notice.
A Permanent Solution
The two solutions described here have helped bring lip sync problems under control at Liberty's stations. At the very least, we no longer make lip sync problems any worse after programming passes through our stations.
However, our solutions conform to no current technical standards, since there are none for the hybrid analog-digital facilities that most broadcasters use today. Our creative personnel, especially operators of production switchers, have been made aware that their extensive use of video effects can cause problems. For this reason, we seek to keep any DVE effects at a single frame delay. It's a trade-off we shouldn't have to make.
If only video production switchers had a standard control interface to audio processing equipment and delay compensation equipment such as the AD-3100, the complexities of solving rapidly shifting video and audio timing issues would be automatic.
Unfortunately, as the popularity of HDTV grows among viewers and more programming options become available, lip sync problems are probably going to get worse. These errors are magnified on HDTV. Higher resolution and close-ups on larger screens can make the out-of-sync sound and pictures very annoying to viewers.
The Advanced Television System Committee (ATSC) addressed the audio-video synchronization problem in DTV. Its study ultimately led to IS/191, ATSC Implementation Subcommittee Finding: Relative Timing of Sound and Vision for Broadcast Operations, the audio should never lead the video program by more than 15 milliseconds, and should never lag the video program by more than 45 milliseconds.
We support the continuing efforts of SMPTE to find a solution for this problem and encourage support of the SMPTE's pending "Request for Information on Sources of Lip Sync Errors, Measurement, and Correction." We also urge SMPTE to not only embrace all-digital solutions, but those that can be implemented in today's hybrid analog-digital plants.
In such hybrid plants, the permanent solution for the lip sync problem is an intelligent dynamic technology that can determine the specific amount of delay and instantly adjust the number of frames of delay needed anywhere in the broadcast chain. This means manufacturers would have to design equipment that continually keeps audio and video in sync.
Today, not a single vendor builds a production switcher that compensates for audio delay. Nor is there any test and measurement equipment available to evaluate sync errors in a hybrid broadcast plant. Our engineers note errors simply by looking at lips. We use a high resolution monitor, wait for the right shot, and eyeball the delay. Not very high-tech, but we've found no better measurement tool.
The good news is that engineers can solve most lip sync problems by attacking the delay at each point in the broadcast chain. Equipment is available to make the corrections. However, station engineers must be vigilant in locating and correcting the errors throughout their plant. Until a better way comes along, this means using our eyes as well as our heads.
What's that saying "Read my lips"?