Synchronization of Multiple Audio Devices

By Reed Lawson
MTS - Silicon Graphics Inc.
September 1997

Introduction

With the introduction of the O2, Octane, Origin, and Onyx 2 product lines comes an updated Audio Library (AL 2.0) with the flexibility to handle multiple audio I/O devices. (Patches are available for 6.2 and 6.3 IRIX) This paper explains how to manage multiple audio input and output streams using AL 2.0 and how to maintain sample accuracy amongst multiple audio streams .

Sample Accurate Support

Some of the earlier audio hardware from SGI can not support sample accurate sync. The audio hardware in these systems lacks the necessary time stamping hardware. In these earlier systems, this function is simulated in software at the kernel driver level and is accurate to within +/- 4 audio sample frames. This is rarely a problem since these systems have a small number of audio devices. One may never have an occasion to sync them.

To date, the following systems support sample accurate sync: Octane base audio, Origin2000 w/MIO, Onyx2 base audio, the PCI-AUD-8C digital audio option card for O2, Octane, Origin2000, Origin 200, and Onyx2. The rest of this paper speaks to synchronization of multiple audio devices on these supported systems.

Realtime Management of the Audio Streams

The SGI Audio Library provides the developer with a rich set of control and query functions enabling straightforward and clean real time management of audio streams. The time stamping and stream counting facilities are provided by the alGetFrameTime() and alGetFrameNumber() library functions.

Each audio device (Analog in, Digital Out, etc) has a 64 bit hardware counter associated with it that is always increasing in value as long as the port is open so that each sample of audio has a serial number. This counter is referred to as the Media Stream Counter or MSC. Each time this counter changes value, a system time stamp is stored in the audio hardware along side it. This time stamp is 64 bits wide and represents the number of nanoseconds that have elapsed since the system was powered up or reset (known as Unadjusted System Time or UST). The audio system software reads this pair of values from the hardware periodically (around once every millisecond) to make it available to user software via the alGetFrameTime() library call. The UST returned by this call truly represents the system time at which the counter changed to the value returned as the MSC. The accuracy of the time stamp is better than +/- 1 microsecond.

So, the two big things to notice here are:

  1. The UST returned by alGetFrameTime() represents the very beginning of the new audio frame (the serial number of which is given by the returned MSC) to an accuracy better than 1 microsecond.
  2. Even though the pair is stored in the hardware at a rate equal to the sampling rate, the audio system software only grabs the pair once a millisecond to make it available to user software.

Using UST to Associate Audio Frames Across Multiple Devices

Just because two audio devices are running at the same sample rate does not mean that their audio frames line up exactly. In fact, if there is no hardware mechanism locking them together, they are probably slipping slowly with respect to each other. In SGI systems, there is usually a way to frequency lock output devices and analog input devices to a common word clock or video sync signal. However, digital input devices are at the mercy of the sending end.

If you expect your application to produce sample accurate results, you must frequency lock your input sources and/or output devices to a common clock source (like house video sync or AES 'Black'). Even when sources are frequency locked, it does not mean they are phase locked. In order to correctly associate audio frames amongst multiple devices, you must first determine the phase relationship between them. This can be done by comparing the USTs returned from calls to alGetFrameTime() for each device. When you call alGetFrameTime() for each of two ports, you obtain two values of UST (say, UST1 and UST2 ) and two values of MSC (say MSC1 and MSC2) The difference between the USTs will indicate the amount of phase shift between the MSC counters. The difference between the MSCs is an important constant that is used to sync up the audio queues (more on this later).

Phase angle is historically represented in degrees of a circle. For simplicity, let us work in percent. In the above diagram, we could say that MSC2 changes about 20% into MSC1's audio frame. Computing this phase percent is very straight forward. Simply divide the difference between the two UST values by the number of UST clocks in a frame.

Phase = ((UST2 - UST1)/nanoseconds_per_frame) * 100;

Once we have this number, we can make a judgement about which frame number in one audio stream (MSC 1) belongs to which number in the other (MSC 2). If the phase we compute is less than 50 and greater than -50, then we are safe to associate the two frames. If it is outside this range, then we need to compute a new MSC2 which can be associated with MSC1.

/* Compute how many full cycles off we are */
FramesOff = 0;
if(Phase >  50) FramesOff = (Phase + 50)/100;
if(Phase < -50) FramesOff = (Phase - 50)/100;

/* Adjust the second MSC to align as closely as possible to the first */
MSC2 -= FramesOff;

Note also that, even though the two calls to alGetFrameTime() are very close together, the USTs returned may be many audio cycles apart. The above math will still work, though, because the finer phase information is available in the least two significant decimal digits of the "Phase" variable.

MSC Offsets Between Ports

If two audio ports are running at the same sampling rate and locked to the same sync source, then the difference (offset) between their MSCs will never change as long as the two ports are up and running. This offset is easily computed once the above phasing issue has been accounted for; simply subtract MSC1 from MSC2.

/* Compute MSC1 and MSC2 offset */
Offset_1to2 = MSC2 - MSC1;

Half of the problem of synchronizing audio is over once you have gotten this far. The next step is to align samples you intended to be aligned in the output or input queues. Let's look at the output case first.

Stuffing the Audio Output Queues Synchronously

Audio Library function alGetFrameNumber() returns a 64 bit number indicating what serial number (MSC) will be assigned to the next sample you write to the audio queue. This value is valid only until the queue underflows. So, one must "prime the pump" so to speak, in order to assess the queue positions. So, the following procedure is recommended to synchronize two output ports:

  1. With the output ports already open, write a specific number of zero frames to each using alWriteFrames(). The number you choose depends on how much work needs to be done before you can start stuffing real audio. The idea is to stuff enough zeros so that the ports will not underflow before you are able to get real audio written.
  2. Call alGetFrameNumber() for each output port. The difference between these numbers should be near the Offset_1to2 that we obtained in the exercise above.
  3. If this difference in the queue positions is not identical to Offset_1to2 (which is very likely), stuff a number of extra zeros in the appropriate port in order to align the positions.
  4. You may now start stuffing audio in these queues confident that each sample you stuff in the first queue will be aligned to the corresponding ones you stuff in the second.

Once this initialization phase is done, you are ready to enter the steady state audio port feeding loop which uses alWriteFrames() to stream your data to the output devices.

Reading the Audio Input Queues Synchronously

Input is the same as output but the other way around (hee-hee). Um, right. When you open an input port, the queue begins filling immediately. There is no need to "prime the pump" as was necessary in the case of the output port. Just open the port, read the queue positions with alGetFrameNumber() and pull excess frames off the appropriate queues to align them with alDiscardFrames(). This has to be done before the queue has a chance to overflow, so, its best to open the audio ports just before you are ready to align the queues. Once this initialization process is complete, you need to get right into the steady state loop that reads these input queues before they overflow.

Maintaining Synchronicity in Steady State

Once you have the audio ports started in sample accurate sync, there are only two ways that they can get out of sync.

  1. Audio Port under/overflow.

    If your audio port feeding loop ever lets the port underflow (or overflow for the input case), then the port will drop samples until you give it more to send (or make some room for new samples). There are two precautions to use to insure that this does not happen.

  2. Audio ports not frequency locked

    If you expect the ports to be sample accurately synchronized, you must make sure they have their sample clocks locked. In the systems that support sample accurate sync, the audio output ports may be locked to any of five sources: Internal, the AES input, the ADAT Input, the video reference input, the internal video sync. Make sure that all the outputs you want to remain in sync are locked to the same source and running at the same sampling rate.

    For digital input, you are at the mercy of the devices feeding you. Make sure if you can that the devices feeding you are locked to a common clock.

If you guard against the above two pitfalls, there is no reason your audio should come out of sample accurate sync. Its like the cogs in a gear box: They should never slip.

However, if you want to check up on the sync (to maybe warn the user that something is slipping), you use the same method used to start the port and compare the MSC offsets to those you obtained when you started. The offsets between the ports (as computed from calls to alGetFrameTime) should be identical to the offsets found when the ports were first opened if they are frequency locked. This test will not detect over/underflow.

To detect over/under flow is slightly more complicated. Here is what you do for the output underflow case: At the beginning of every loop, make a call to alGetFrameNumber() before calling alWriteFrames(). Save that value away. Now at the beginning of the next loop, compare that saved value to the value returned from a fresh call to alGetFrameNumber(). The difference should be the number of frames you wrote in the call to alWriteFrames() in the last loop. If not, there was an underflow. Input overflow is detected in a similar manner.

Sample Code

Starting with patch1908 and patch2112, there is a sample program written by the author of this paper that plays multiple audio files across multiple output devices. You will find it in /usr/share/src/dmedia/audio/waop.c. It has examples of how to start the ports in sample accurate sync, how to schedule a high non-degrading priority, how to check for underflow while the audio is playing, and how to use alWriteBuffers() for more flexible audio output.

Further Reading