Tomasz Stachowiak
1. Introduction 32. Technology 32.1 H.263 video compression 32.1.1 H.263 vs. H.261 32.1.2 Negotiable options 32.2 Audio compression 42.2.1 GSM 06.10 42.2.2 Intel/DVI ADPCM 43. Project 53.1 Audio-video client 53.1.1 VOD system architecture overview 53.1.2 Overview 63.1.3 New elements 63.1.4 Synchronization 63.1.5 Random access 73.2 H.263 stream offsetting 7
Video On Demand(VOD) project is just a part of my internship in NPAC. My primary responsibility is Conferencing and Collaboration System but because I needed to implement video and audio synchronization algorithm for videoconferencing tools and I met some problem with starting this work at the conference level I decided to do it first in some better known environment such as VOD system. Additionally I could create tools which would be then integrated with this project. After that I made also some changes and improvements in existing VOD software connected with video and audio compression algorithms such as H.263, GSM 06.10, ADPCM.
With H.263 it is possible to achieve the same quality as H.261 with 30-50% of the bit usage. Most of this is due to the half pel prediction and negotiable options in H.263. There is also less overhead and improved VLC tables in H.263.
These options are negotiable. This means the decoder signals the encoder which of the options it has the capability to decode. If the encoder has any of these options, it can then turn them on, and for each of the options used the quality of the decoded video-sequence will increase.
GSM is a telephony standard defined by the European Telecommunications Standards Institute (ETSI). The GSM 06.10 compressor models the human-speech system with two digital filters and an initial excitation. The linear-predictive short-term filter, which is the first stage of compression and the last during decompression, assumes the role of the vocal and nasal tract. It is excited by the output of a long-term predictive (LTP) filter that turns its input--the residual pulse excitation (RPE)--into a mixture of glottal wave and voiceless noise. GSM encoder compress 160 16-bit voice samples into 264-bit gsm frame. GSM 06.10 is faster than code-book lookup algorithms such as CELP. It offers 13kbps bandwidth.
ADPCM compression algorithm uses the correlation between adjacent
audio samples to reduce bit rate. It transmits only the differences
between samples and their predicted values which have less dynamic
range than the samples themselves. Predictor coefficients and
reconstruction levels are calculated dynamically using coded signal.
It allows to reduce the bandwidth but make this adaption technique
more susceptible to transmission errors. There are a few ADPCM
standards like e.g. Intel/DVI or G.721. Intel/DVI is not very
computationally intensive still having good quality even for the
music.
VOD system consists of three major parts:
VOD system architecture is shown on figure1.
Communication between database and clients is performed via Nestcape
Navigator using standard HTTP, CGI mechanisms. Data from the server
are transmitted using socket connections (TCP). Transmission control
is done by additional control connection based on VOD client-server
protocol.
Figure 1 VOD System architecture
AV client is implemented for the SGI Indy, IRIX 5.x platform. It uses H.263 video compression, and either GSM or ADPCM audio compression. It supports QCIF and CIF file formats. It allows to play the movie, stop it and random access the movie.
In existing movie formats audio and video data were included in
the same stream. But what we wanted to do was to synchronize AV
data from two independent streams. Hence it was necessary to open
two server connections, send and receive control messages from
two separate channels. Unfortunately database system offers information
just about video stream. Therefore audio configuration had to
be included in the data file. It was done by adding special audio
header whose structure is presented below in table1
Table 1 Audio header
Name | Length (in bytes) | Description |
Title | 12 | String "npac-audio". Indicates that file is NPAC audio stream |
Rate | 2 | Rate in samples per second |
Channels | 1 | Channels number:
1 - for mono 2 - for stereo |
Sample width | 2 | Sample with in bits |
Code format | 10 | String indicating compression type
"adpcm", "gsm" supported |
Synchronization is based on the internal SGI Audio Library mechanisms. Procedure sending audio samples to the speaker port blocks until all previous sent samples are played. In this case it is enough to send audio portion and then decode video frame to achieve the synchronization and keep the frame rate. The only problem that this requires single video frame decoding which moves the responsibility of single frame reading from the decoder to the AV client.
To obtain this effect some changes were necessary both at the decoder and client side:
Since audio portions take always the same size it isn't a problem
to find beginning of the portion knowing its size. Situation differs
in H.263 video compression. First, H.263 frames have different
length, second because it is predictive compression only INTRA
frames can be accessed in this way. The way of solving this problem
is described in chapter H.263 stream offsetting.
H.263 system is supposed to be used as a preview tool to the MPEG movies. Hence H.263 sequence will be acquired by converting MPEG files. As a preview tool it is absolutely necessary to have the capabilities of random access. It implies necessity of offset files generation.
To obtain offset files it was necessary to add some new features to the H.263 encoder:
This mechanism allow to generate offset file parallel with movie converting