Text media handling in RTP based real-time and message conferences

Real-time text is a medium in real-time conversational sessions. Text entered by participants in a session is transmitted in a time-sampled fashion, so that no specific user action is needed to cause transmission. This gives a direct flow of text that is suitable in a real-time conversational setting. The real-time text medium can be combined with other media in multimedia sessions. A number of multimedia sessions can be combined in a multi-party session. This memo specifies how the real-time text streams are handled in such multi-party calls. The description is mainly focused on the transport level, but also describes a few presentation level features. Transport of real-time text is specified in RFC 4103 RTP Payload for text conversation. It makes use of RFC 3550 Real Time Protocol, for transport, and is usually used in the SIP RFC 3261 Session Initiation Protocol environment, even if it is also used in other call control environments. Call control aspects in this specification are explained with examples from SIP. The specifications about how to handle multi-party text transport, identification and presentation are valid also for other call control environments where RTP and RTCP are used. A very brief overview of functions for both real-time and messaging text handling in multi-party sessions is described in RFC 4597 Conferencing Scenarios . This specification builds on that description and indicates what existing protocol mechanisms should be used to implement multi-party handling of text in real-time sessions.

In the centralized conference model, one function co-ordinates the sessions with participants in the multi-party session. This function also controls media mixer functions for the media appearing in the session. The central function is common for control of all media, while the media mixers may work differently for each medium. The central function is called the Focus UA and may be co-located in an advanced terminal including multi-party control functions, or it may be located in a separate location. Many variants exist for setting up sessions including the multipoint control centre, It is not within scope of this description to describe these, but rather the media specific handling in the mixer required to handle multi-party calls. The main principle for handling real-time text media in a centralized conference is that one RTP session for real-time text is established between the multipoint media control centre and each participant who is going to have real-time text exchange with the others. Within each RTP session, text from each participant is transmitted from the media mixer as a separate RTP stream, thus all using the same destination address/port combination, but using different RTP SSRCs as described in Section 7.1 of RTP RFC 3550 about the Translator function. This methods enables the receiver to freely select display characteristics of the text conversation. General session control aspects for multi-party sessions are described in RFC 4575 A Session Initiation Protocol (SIP) Event Package for Conference State, and RFC 4579 Session Initiation Protocol (SIP) Call Control - Conferencing for User Agents . The nomenclature of these specifications are used here.

The Focus UA co-ordinates the media flow. Real-time text media from different sources are combined in one text media session by the Focus UA. The Focus UA acts as an RTP Translator as described in RFC 3550 RTP Section 7.1. The RTP text stream from each participant who transmits text is allocated one unique SSRC. The SSRC is used by the receiver to identify text packets originating from one source. Each RTP packet MUST contain text from only one source. The redundancy mechanism for increased robustness used by the RFC 4103 transport makes use of the RTP sequence number for detection of loss. One sequence number series is maintained per RTP stream identified by one SSRC. The RTP Translation mechanism maintains a separate SSRC for each source RTP stream in the combined RTP session. Therefore the RTP Translation mechanism can be used for conveying text from multiple sources to one destination, with maintained possibility to detect and recover loss and identify text from the different sources. As soon as a new member is added to the RTP session, its characteristics shall be transmitted in RTCP SDES reports according to section 6.5 in RFC 3550. In the RTCP SDES report, SHOULD contain identification of the source represented by the SSRC identifier. This identification MUST contain the CNAME field and MAY contain the NAME field and other defined fields of the SDES report. A focus UA SHOULD primarily convey SDES information received from the sources of the session members. When such information is not available, the focus UA SOULD compose CNAME and NAME information from available information from the SIP session with the participant.

All session participants MUST observe the SSRC field of incoming text RTP packets, and make note of what source they came from in order to be able to present text in a way that makes it easy to read text from each participant in a session, and get information about the source of the text.

A source identity SHOULD be composed from available information sources and displayed together with the text as indicated in ITU-T T.140 Appendix . The source should primarily be the NAME field from incoming SDES packets. If this information is not available, and the session is a two-party session, then the T.140 source identity SHOULD be composed from the SIP session participant information. For multi-party sessions the source identity may be composed by local information if sufficient information is not available in the session. Applications may abbreviate the presented source identity to a suitable form for the available display.

Display space limitations and other considerations may call for an opportunity for the user to select what sources of text to present, at what stage in the reception process to display them and how to present them. The specification draft-hellstrom-text-preview specifies such presentation aspects.

UAs participating in sessions with real-time text, SHOULD send SDES packets in RTCP giving values to appropriate identification fields. The NAME field should be given a value that is suitable as an identifier of text from the user of the UA.

This document makes no request of IANA. Note to RFC Editor: this section may be removed on publication as an RFC.

The security considerations valid for RFC 4103 and RFC 3550 are valid also for the multi-party sessions with text.

The congestion considerations described in RFC 4103 are valid also for multi-party use of the real-time text RTP transport. A risk for congestion may appear if a number of conference participants are active transmitting text simultaneously, because this multi-party transmission method does not allow multiple sources of text to contribute to the same packet. In situations of risk for congestion, the Focus UA MAY combine packets from the same source to increase the transmission interval per source up to one second. Local conference policy in the Focus UA may be used to decide on which streams shall be selected for such transmission frequency reduction.