next up previous contents
Next: Historical Notes Up: About RTP and the Audio-Video Transport Working Group Previous: Implementations of Older RTP Versions

Some Frequently Asked Questions

Is RTP a transport protocol or a kind of application protocol?

RTP has important properties of a transport protocol: it runs on end systems, it provides demultiplexing. It differs from transport protocols like TCP in that it (currently) does not offer any form of reliability or a protocol-defined flow/congestion control. However, it provides the necessary hooks for adding reliability, where appropriate, and flow/congestion control. Some like to refer to this property as application-level framing (see D. Clark and D. Tennenhouse, "Architectural considerations for a new generation of protocols", SIGCOMM'90, Philadelphia). RTP so far has been mostly implemented within applications, but that has no bearing on its role. TCP is still a transport protocol even if it is implemented as part of an application rather than the operating system kernel.

RTP does not ensure real-time delivery. So how come it is called a real-time protocol?

No end-to-end protocol, including RTP, can ensure in-time delivery. This always requires the support of lower layers that actually have control over resources in switches and routers. RTP provides functionality suited for carrying real-time content, e.g., a timestamp and control mechanisms for synchronizing different streams with timing properties.

Is Is RTP an unreliable protocol? Are there any mechanisms provided for error recovery in RTP?

As currently defined, RTP does not define any mechanisms for recovering for packet loss. Such mechanisms are likely to be highly dependent on the packet content. For example, for audio, it has been suggested to add low-bit-rate redundancy, offset in time. For other applications, retransmission of lost packets may be appropriate. This requires no additions to RTP. RTP probably has the necessary header information (like sequence numbers) for some forms of error recovery by retransmission.

Can RTP run over IPng? ATM?

Yes. RTP contains no specific assumptions about the capabilities of the lower layers, except that they provide framing. It contains no network-layer addresses, so that RTP is not affected by addressing changes. Any additional lower-layer capabilities such as security or quality-of-service guarantees can obviously be used by applications employing RTP. There are several implementations of video tools that run RTP directly over AAL5. It should be noted that the RTCP CNAME field is currently based on the assumption that hosts have Internet-style domain names.

Why can't we just use TCP for audio and video?

For delivering audio and video for playback, TCP may be appropriate. Also, with sufficiently long buffering and adequate average throughput, near-real-time delivery using TCP can be successful, as practiced by the Netscape WWW browser. TCP may often run over highly lossy networks (e.g., the German X.25 network) with acceptable throughput, even though the uncompensated losses would make audio or video communication impossible.

However, for real-time delivery of audio and video, TCP and other reliable transport protocols such as XTP are inappropriate. The three main reaons are:

An additional small disadvantage is that the TCP and XTP headers are larger than a UDP header (40 bytes for TCP and XTP 3.6, 32 bytes for XTP 4.0, compared to 8 bytes). Also, these reliable transport protocols do not contain the necessary timestamp and encoding information needed by the receiving application, so that they cannot replace RTP. (They would not need the sequence number as these protocols assure that no losses or reordering takes place.)

While LANs often have sufficient bandwidth and low enough losses not to trigger these problems, TCP does not offer any advantages in that scenario either, except for the recovery from rare packet losses. Even in a LAN with no losses, TCP would suffer from the initial slow start delay.

Can't we just use XTP?

Many of the arguments parallel those in the previous section.

The question of the relationship of RTP and XTP appears to arise frequently. (This may simply be due to the word 'transport' in both protocol names.) However, XTP and RTP are not replacements for each other. XTP is designed as a general, configurable network and transport protocol for both reliable and unreliable data communications. RTP has no reliability mechanisms (although these could be added if desired for specific applications) and no flow control like the rate control in XTP. RTP is not intended for regular, reliable data transfer (where TCP or XTP might be used instead). For real-time data, where retransmission is usually not possible due to timing constraints, XTP would have to disable retransmission. Flow/congestion control for real-time data is most likely inappropriate as the rate of such sources is inherently given and not modifiable on the time-scale of transport-protocol flow control, as explained in the previous section. It should be noted that RTP supports mechanisms that allow a form of congestion control on longer time scales, e.g., by modifying the source encoder if network congestion is detected.

RTP has no protocol state by itself and can thus be used over either connection-less networks, such as IP/UDP, or connection-oriented networks, such as XTP, ST-II or ATM (AAL3/4 or AAL5). Many real-time multimedia applications use multicast with a large fan-out, e.g., several hundred to thousands for a lecture or concert. Connection-oriented protocols like XTP have difficulty scaling to such a large number of receivers.

XTP does not offer timing or content type (media) information, and thus would need these services, as offered by RTP. XTP provides no RTP-like direct feedback of the received quality-of-service, and thus, again, would have to "import" these from another protocol. Looking at existing applications using XTP for real-time services confirms that they need to add a layer similar in content to the RTP data part "between" XTP and the actual media.

Is there an RTP library or kernel implementation?

RTP (in particular, the data part) is tightly coupled to the application, so that a kernel or library implementation makes little sense. However, NeVoT can be used as a linkable library that implements RTP for an audio tool, with a documented API. The sources to NeVoT, rtpdump and vic also contain RTCP processing modules which should be usable in other applications with minor modifications. Note also that the specification itself contains numerous code fragments. (Most of the other applications are using older versions of RTP and thus should not be relied upon for developments.)

What are some of the differences between the VAT protocol and RTP?

The VAT protocol was originally implemented in the VAT audio tool and subsequently also in other audio tools such as NeVoT. It is currently the most frequently used packet format for audio on the MBONE. The VAT header format is only described in header files. (See the VAT and NeVoT sources for details.) Many aspects of RTP and the VAT protocol are similar, but RTP improves upon the VAT protocol in a number of ways:

A new version of VAT (currently in alpha-test) also implements RTP. As soon as there are a sufficient number of stable applications using RTP, it is anticipated that most Internet MBONE audio/video events will be transmitted using RTP.

What are the differences between RTP version 1 and 2?

Version 1 is of historical interest only. Applications should not be written for it. RTP version 2 is not backwards compatible with version 1. If you care, you can find a definition of version 1 in an old Internet draft.

Are there related ITU efforts?

Media formats:
G.711:
Audio encoding at 64 kb/s (mu-law and A-law).
H.261:
Video encoding.
H.263:
Video encoding, improved version of H.261.
H.324:
Audio and video over POTS at less than 20 kb/s.

For conferencing over ISDN:

H.221:
Frame structure for a 64 to 1920 kbit/s channel in audiovisual teleservices.

H.320:
Framework for transmitting audio and video over circuit-switched digital networks (primarily ISDN).

H.323:
H.320 over LAN.

For conference control, application and data sharing, there are a number of recommendations:

T.120:
Introduction to the audiographics and audiovisual conferencing recommendations.
T.121:
Generic application template.
T.122:
Multipoint communication service for audiographics and audiovisual conferencing service definition
T.123:
Protocol stack for audiographics and audiovisual teleconference applications.
T.124:
Generic conference control.
T.125:
Multipoint communication service protocol specification.
T.126:
Still image protocol specification.
T.127:
Multipoint binary file transfer protocol.

Are there other efforts in using the Internet for real-time audio and video?

Too many, some may say. vat versions 3.4 and earlier, one of the early (recent) Internet audio applications, uses mostly the same audio encodings as specified in the RTP profile, but a different protocol. There are a number of "Internet telephones" (usually for PCs) using proprietary audio coding and protocols, meant for point-to-point connections:

For near real-time distribution of audio, e.g., the on-demand delivery of music or news:

CuSeeMe (for Windows PC and the Macintosh) is a combined audio and video tool using reflectors rather than IP-level multicast.

RealAudio writes what currently applies to all tools:

If the packet loss is high, it may be due to a busy network. If this is the case, there is little you can do to remedy the situation other than to try connecting to the site at a later time.

A survey can be found at www.von.com.



Henning Schulzrinne