Sound is produced by the vibration of matter. Audio refers to the sound within the human hearing range. An audio signal in the natural world is analog, which is continuous both in time and amplitude. To be stored and processed by computers and transmitted through computer networks, the analog signal must be converted to digital form. First, the audio signal is converted to an analog electric signal through microphone. Then, this analog electric signal is digitized by an Analog-to-Digital Converter (ADC). The analog-to-digital conversion comprises two steps: (1) sampling, and (2) quantization. Digital audio may be quantized using different encoding schemes to compress the file size.
To sample a signal means to examine it at some point in time, as shown in Figure 1. Sampling usually happens at equally separated intervals; this interval is called the sampling interval. The reciprocal of sampling interval is called the sampling frequency or sampling rate. The unit of sampling interval is second. The unit of sampling rate is Hz, which means cycles per second.
In Figure 1, the sampling interval is 1 usec (10^-6 sec), or in other words the sampling frequency is 1 MHz (10^6 Hz). This means that the ADC samples this sine wave every 1 usec.
Assume the analog signal has the highest frequency of f;
to reconstruct the original analog signal faithfully, the sampling
rate must be at least 2f. This is also called the sampling theorem.
In Figure 1, the frequency of this sine wave is 2x10^4 Hz, and the
sampling frequency is 10^6 Hz, which is much greater than 2x2x10^4 Hz.
So this sine wave will be faithfully reconstructed back to an analog
signal.
Because human hearing is limited to a range of 20 Hz to 20-kHz, the sampling frequency for CD quality is often 44.1 kHz. Since human speech is only limited to 20 Hz to 3000 Hz, an 8000 Hz sample frequency is high enough for telephony quality audio.
To quantize a signal means to determine the signal's value to
some degree of accuracy. Figure 2 shows the same analog signal being quantized.
The digital signal is defined only at the points at which it is sampled.
The height of each vertical bar can take on only certain values,
shown by horizontal dashed lines, which are sometimes higher and sometimes
lower than the original signal, indicated by the dashed curve.
In Figure 2, 11 quantization levels are used, and hence 4
bits are needed to encode each sample.
If the height of each bar is translated into a digital number, then the signal is said to be represented by pulse-code modulation, or PCM.
The difference between a quantized representation and an original
analog signal is called the quantization noise. With more
bits for quantization of a PCM signal, the signal sounds clearer.
Using higher sampling frequency and more bits for quantization will produce better quality digital audio. But for the same length of audio, the file size will be much larger than the low quality audio. For example, the CD-quality audio use 44.1 kHz sampling rate and 16 bits amplitude. The resulting aggregated bit rate (bits per second) of a stereophonic (2 channels) CD-audio stream is thus 44.1*16*2=1,411.2 kbps. On the other hand, the telephony quality audio uses 8 kHz sampling rate and 8 bits amplitude. The file size of one second speech is only 8*8=64 Kbits.
PCM audio encoding method is an uncompressed audio format. Real-world requirements may make it impossible to handle the full bit stream of, for example, CD-quality audio. In the following we will introduce some other audio encoding methods that are used to compress digital audio.
For speech signals, a system which works with a quantization
step size that increases logarithmically with the level of the signal
is widely used.
This allows a larger range of values to be covered with the same number
of bits.
The International Telecommunication Union - Telecommunication
Standardization Sector (ITU-T) Recommendation G.711 codifies
the A-law and encoding
scheme. For A-law encoding, the formula is
In standard telephone work,
is set to 255. The result is an 8-bit per sample
signal that produces the dynamic range approximately associated with
12-bit PCM.
Delta modulation
method is to encode not the value of each sample, but
the difference between one sample and the next. In most cases,
the difference between one sample and the next is much smaller than the
sample itself, so fewer bits are needed to encode the difference compared
to using the complete sample value. Its variation, adaptive
delta pulse code modulation, or ADPCM, is often used to handle
both signals that change quickly as well as signals that change slowly.
The step size encoded between adjacent samples varies according to the
signal itself. In other words, if the waveform is changing rapidly,
large steps can be quantized. ADPCM typically achieves compression
ratios of 2:1 when compared to or A-law PCM.
There are also many other encoding schemes that compress digital
audio (see F. Francois, 1995 for details).
Table 1 summarizes the main formats used for real-time audio.
Name | Encoding method |
Bits per sample |
Sampling rate(kHz) |
Bandwidth (kbps) |
---|---|---|---|---|
G.711 | A-law PCM | 8 | 8 | 64 |
G.721 | ADPCM | 4 | 8 | 32 |
G.722 | SB-ADPCM | 14 | 16 | 64,56,48 |
G.728 | LD-CELP | 14 | 16 | 16 |