WWW: Beyond the Basics

11. Real-time Audio and Video

11.2 Digital Audio

Sound is produced by the vibration of matter. Audio refers to the sound within the human hearing range. An audio signal in the natural world is analog, which is continuous both in time and amplitude. To be stored and processed by computers and transmitted through computer networks, the analog signal must be converted to digital form. First, the audio signal is converted to an analog electric signal through microphone. Then, this analog electric signal is digitized by an Analog-to-Digital Converter (ADC). The analog-to-digital conversion comprises two steps: (1) sampling, and (2) quantization. Digital audio may be quantized using different encoding schemes to compress the file size.

11.2.1 Sampling

To sample a signal means to examine it at some point in time, as shown in Figure 1. Sampling usually happens at equally separated intervals; this interval is called the sampling interval. The reciprocal of sampling interval is called the sampling frequency or sampling rate. The unit of sampling interval is second. The unit of sampling rate is Hz, which means cycles per second.

In Figure 1, the sampling interval is 1 usec (10^-6 sec), or in other words the sampling frequency is 1 MHz (10^6 Hz). This means that the ADC samples this sine wave every 1 usec.

Assume the analog signal has the highest frequency of f; to reconstruct the original analog signal faithfully, the sampling rate must be at least 2f. This is also called the sampling theorem. In Figure 1, the frequency of this sine wave is 2x10^4 Hz, and the sampling frequency is 10^6 Hz, which is much greater than 2x2x10^4 Hz. So this sine wave will be faithfully reconstructed back to an analog signal.


Figure 1: Sampled waveform.


Because human hearing is limited to a range of 20 Hz to 20-kHz, the sampling frequency for CD quality is often 44.1 kHz. Since human speech is only limited to 20 Hz to 3000 Hz, an 8000 Hz sample frequency is high enough for telephony quality audio.

11.2.2 Quantization

To quantize a signal means to determine the signal's value to some degree of accuracy. Figure 2 shows the same analog signal being quantized. The digital signal is defined only at the points at which it is sampled. The height of each vertical bar can take on only certain values, shown by horizontal dashed lines, which are sometimes higher and sometimes lower than the original signal, indicated by the dashed curve. In Figure 2, 11 quantization levels are used, and hence 4 bits are needed to encode each sample.



Figure 2: Four-bit quantization


If the height of each bar is translated into a digital number, then the signal is said to be represented by pulse-code modulation, or PCM.

The difference between a quantized representation and an original analog signal is called the quantization noise. With more bits for quantization of a PCM signal, the signal sounds clearer.

Using higher sampling frequency and more bits for quantization will produce better quality digital audio. But for the same length of audio, the file size will be much larger than the low quality audio. For example, the CD-quality audio use 44.1 kHz sampling rate and 16 bits amplitude. The resulting aggregated bit rate (bits per second) of a stereophonic (2 channels) CD-audio stream is thus 44.1*16*2=1,411.2 kbps. On the other hand, the telephony quality audio uses 8 kHz sampling rate and 8 bits amplitude. The file size of one second speech is only 8*8=64 Kbits.

11.2.3 Other Encoding Schemes

PCM audio encoding method is an uncompressed audio format. Real-world requirements may make it impossible to handle the full bit stream of, for example, CD-quality audio. In the following we will introduce some other audio encoding methods that are used to compress digital audio.

11.2.3.1 A-law and µ-law Encoding

For speech signals, a system which works with a quantization step size that increases logarithmically with the level of the signal is widely used. This allows a larger range of values to be covered with the same number of bits. The International Telecommunication Union - Telecommunication Standardization Sector (ITU-T) Recommendation G.711 codifies the A-law and µ-law encoding scheme. For A-law encoding, the formula is

y=Ax/(1+lnA), where (0<=x<=1/A), and
y=(1+Ax)/(1+lnA), where (1/A<=x<=1).

For µ-law transmission, the signal is encoded according to

y = ln(1+ µx)/ln(1+ µ), where (0<=x<=1).

In standard telephone work, µ is set to 255. The result is an 8-bit per sample signal that produces the dynamic range approximately associated with 12-bit PCM.


11.2.3.2 Delta Modulation and Adaptive Delta Pulse Code Modulation

Delta modulation method is to encode not the value of each sample, but the difference between one sample and the next. In most cases, the difference between one sample and the next is much smaller than the sample itself, so fewer bits are needed to encode the difference compared to using the complete sample value. Its variation, adaptive delta pulse code modulation, or ADPCM, is often used to handle both signals that change quickly as well as signals that change slowly. The step size encoded between adjacent samples varies according to the signal itself. In other words, if the waveform is changing rapidly, large steps can be quantized. ADPCM typically achieves compression ratios of 2:1 when compared to µ-law or A-law PCM.

There are also many other encoding schemes that compress digital audio (see F. Francois, 1995 for details). Table 1 summarizes the main formats used for real-time audio.


Table 1. Typical formats for real-time audio

Name Encoding
method
Bits per
sample
Sampling
rate(kHz)
Bandwidth
(kbps)
G.711 A-law PCM 8 8 64
G.721 ADPCM 4 8 32
G.722 SB-ADPCM 14 16 64,56,48
G.728 LD-CELP 14 16 16



[PREV] [NEXT] [UP] [HOME] [VT CS]

<shaohong@csgrad.cs.vt.edu>
Last modified: Sun Dec 8 1996