By Reed Lawson
MTS - Silicon Graphics Inc.
September 1997
With the introduction of the O2, Octane, Origin, and Onyx 2 product lines comes an updated Audio Library (AL 2.0) with the flexibility to handle multiple audio I/O devices. (Patches are available for 6.2 and 6.3 IRIX) This paper explains how to manage multiple audio input and output streams using AL 2.0 and how to maintain sample accuracy amongst multiple audio streams .
Some of the earlier audio hardware from SGI can not support sample accurate sync. The audio hardware in these systems lacks the necessary time stamping hardware. In these earlier systems, this function is simulated in software at the kernel driver level and is accurate to within +/- 4 audio sample frames. This is rarely a problem since these systems have a small number of audio devices. One may never have an occasion to sync them.
To date, the following systems support sample accurate sync: Octane base audio, Origin2000 w/MIO, Onyx2 base audio, the PCI-AUD-8C digital audio option card for O2, Octane, Origin2000, Origin 200, and Onyx2. The rest of this paper speaks to synchronization of multiple audio devices on these supported systems.
The SGI Audio Library provides the developer with a rich set of control and query functions enabling straightforward and clean real time management of audio streams. The time stamping and stream counting facilities are provided by the alGetFrameTime() and alGetFrameNumber() library functions.
Each audio device (Analog in, Digital Out, etc) has a 64 bit hardware counter associated with it that is always increasing in value as long as the port is open so that each sample of audio has a serial number. This counter is referred to as the Media Stream Counter or MSC. Each time this counter changes value, a system time stamp is stored in the audio hardware along side it. This time stamp is 64 bits wide and represents the number of nanoseconds that have elapsed since the system was powered up or reset (known as Unadjusted System Time or UST). The audio system software reads this pair of values from the hardware periodically (around once every millisecond) to make it available to user software via the alGetFrameTime() library call. The UST returned by this call truly represents the system time at which the counter changed to the value returned as the MSC. The accuracy of the time stamp is better than +/- 1 microsecond.
So, the two big things to notice here are:
Just because two audio devices are running at the same sample rate does not mean that their audio frames line up exactly. In fact, if there is no hardware mechanism locking them together, they are probably slipping slowly with respect to each other. In SGI systems, there is usually a way to frequency lock output devices and analog input devices to a common word clock or video sync signal. However, digital input devices are at the mercy of the sending end.
If you expect your application to produce sample accurate results, you must frequency lock your input sources and/or output devices to a common clock source (like house video sync or AES 'Black'). Even when sources are frequency locked, it does not mean they are phase locked. In order to correctly associate audio frames amongst multiple devices, you must first determine the phase relationship between them. This can be done by comparing the USTs returned from calls to alGetFrameTime() for each device. When you call alGetFrameTime() for each of two ports, you obtain two values of UST (say, UST1 and UST2 ) and two values of MSC (say MSC1 and MSC2) The difference between the USTs will indicate the amount of phase shift between the MSC counters. The difference between the MSCs is an important constant that is used to sync up the audio queues (more on this later).
Phase angle is historically represented in degrees of a circle. For simplicity, let us work in percent. In the above diagram, we could say that MSC2 changes about 20% into MSC1's audio frame. Computing this phase percent is very straight forward. Simply divide the difference between the two UST values by the number of UST clocks in a frame.
Phase = ((UST2 - UST1)/nanoseconds_per_frame) * 100;
Once we have this number, we can make a judgement about which frame number in one audio stream (MSC 1) belongs to which number in the other (MSC 2). If the phase we compute is less than 50 and greater than -50, then we are safe to associate the two frames. If it is outside this range, then we need to compute a new MSC2 which can be associated with MSC1.
/* Compute how many full cycles off we are */ FramesOff = 0; if(Phase > 50) FramesOff = (Phase + 50)/100; if(Phase < -50) FramesOff = (Phase - 50)/100; /* Adjust the second MSC to align as closely as possible to the first */ MSC2 -= FramesOff;
Note also that, even though the two calls to alGetFrameTime() are very close together, the USTs returned may be many audio cycles apart. The above math will still work, though, because the finer phase information is available in the least two significant decimal digits of the "Phase" variable.
If two audio ports are running at the same sampling rate and locked to the same sync source, then the difference (offset) between their MSCs will never change as long as the two ports are up and running. This offset is easily computed once the above phasing issue has been accounted for; simply subtract MSC1 from MSC2.
/* Compute MSC1 and MSC2 offset */ Offset_1to2 = MSC2 - MSC1;
Half of the problem of synchronizing audio is over once you have gotten this far. The next step is to align samples you intended to be aligned in the output or input queues. Let's look at the output case first.
Audio Library function alGetFrameNumber() returns a 64 bit number indicating what serial number (MSC) will be assigned to the next sample you write to the audio queue. This value is valid only until the queue underflows. So, one must "prime the pump" so to speak, in order to assess the queue positions. So, the following procedure is recommended to synchronize two output ports:
Once this initialization phase is done, you are ready to enter the steady state audio port feeding loop which uses alWriteFrames() to stream your data to the output devices.
Input is the same as output but the other way around (hee-hee). Um, right. When you open an input port, the queue begins filling immediately. There is no need to "prime the pump" as was necessary in the case of the output port. Just open the port, read the queue positions with alGetFrameNumber() and pull excess frames off the appropriate queues to align them with alDiscardFrames(). This has to be done before the queue has a chance to overflow, so, its best to open the audio ports just before you are ready to align the queues. Once this initialization process is complete, you need to get right into the steady state loop that reads these input queues before they overflow.
Once you have the audio ports started in sample accurate sync, there are only two ways that they can get out of sync.
If your audio port feeding loop ever lets the port underflow (or overflow for the input case), then the port will drop samples until you give it more to send (or make some room for new samples). There are two precautions to use to insure that this does not happen.
If you expect the ports to be sample accurately synchronized, you must make sure they have their sample clocks locked. In the systems that support sample accurate sync, the audio output ports may be locked to any of five sources: Internal, the AES input, the ADAT Input, the video reference input, the internal video sync. Make sure that all the outputs you want to remain in sync are locked to the same source and running at the same sampling rate.
For digital input, you are at the mercy of the devices feeding you. Make sure if you can that the devices feeding you are locked to a common clock.
If you guard against the above two pitfalls, there is no reason your audio should come out of sample accurate sync. Its like the cogs in a gear box: They should never slip.
However, if you want to check up on the sync (to maybe warn the user that something is slipping), you use the same method used to start the port and compare the MSC offsets to those you obtained when you started. The offsets between the ports (as computed from calls to alGetFrameTime) should be identical to the offsets found when the ports were first opened if they are frequency locked. This test will not detect over/underflow.
To detect over/under flow is slightly more complicated. Here is what you do for the output underflow case: At the beginning of every loop, make a call to alGetFrameNumber() before calling alWriteFrames(). Save that value away. Now at the beginning of the next loop, compare that saved value to the value returned from a fresh call to alGetFrameNumber(). The difference should be the number of frames you wrote in the call to alWriteFrames() in the last loop. If not, there was an underflow. Input overflow is detected in a similar manner.
Starting with patch1908 and patch2112, there is a sample program written by the author of this paper that plays multiple audio files across multiple output devices. You will find it in /usr/share/src/dmedia/audio/waop.c. It has examples of how to start the ports in sample accurate sync, how to schedule a high non-degrading priority, how to check for underflow while the audio is playing, and how to use alWriteBuffers() for more flexible audio output.